Microservices - Software Architecture - Pattern & Techniques
Advantages
- Low coupling
- Improves modularity
- Make use of parallel development
- Make use of scalability
Drawbacks
- Infrastructure cost are usually higher
- Integration testing is bit complex
- Service management and deployment as single unit
- Nano service anti pattern (service is too fine grain)
Use Cases:
1.] A bank is looking to develop a system for its new web platform that will be a long term project , and will therefore need to be scalable to support future growth
Ans] Yes, this would be a suitable application of microservices architecture since it's a long term project, and a well-designed microservices-based system will support good scalability.
2.] An e-commerce website requires a small application that will be used for a short period to support a temporary promotion scheme that they will have.
Ans] This scenario is contrary to that of the previous question, since the application is not complex and will only be used for a short term it can be developed in less development intensive software architectures such as tiered architecture.
3.] A blogging website with complex functionality that is currently built on a monolith architecture, is looking into doing a quick refactoring to improve the code quality .
Ans] Although the blogging website has complex functionality and is a project for the long term, the company only has time / resources for a 'quick' refactoring and for that reason microservices architecture is not applicable here as this would take a significant amount of time.
Why do most of the microservice project do fail?
- Lack of planning, Knowledge, Skills & Time
How to prevent your project from failures?
- Find out applicability
- Preferences of automation
- Prepare clear plan
- Avoid common mistakes in design
Microservice template:
1] Importance:
- Significant amount of time setting up
- Similar code for each microservice setup
- Cross cutting concerns
2] Template should contain:
- Cross cutting concerns
- Logging
- Metrices
- Connection setup , configuration to database and message brokers
- Project structure (Unit test, domain driven structure etc.)
Here are some templates for different programming languages, if you'd like me to find you a template for a programming language not listed here please let me know in the Q&A section of the course and I'll do my best to find a suitable one:
Java
https://github.com/Nike-Inc/riposte-microservice-template
https://github.com/overture-stack/microservice-template-java
C#
https://github.com/AdrienTorris/aspnet-core-simple-microservices-sample
https://github.com/3pillarlabs/core-microservices-template
Python
https://github.com/python-microservices/microservices-template
https://github.com/austinjung/python-microservices
I recommend finding a suitable template that fits your project model and requirements as closely as possible, copying it (or forking off it in Github) then adding and removing common features as needed. Perhaps you will need to use a custom communication or authentication protocol in all microservices, or your company uses a custom logging library - this is the place to set it up.
Code base repository setup:
1] Mono Repo
Advantage:
- Easier to keep input/output contracts in sync
- Can be version the entire project repo with build number
Disadvantage:
- Different teams working in same code base can be difficult to build, disrupting CI/CD process
- Not able to perform loose coupling, Its tight couple model
- Long build time as large code repo to download, chances are high for failure and re-work
2] Separate Repo
Advantage:
- Different teams can work on different repositories, separate parallel development can be done
- Scope of a single repo is more clear
Disadvantage:
- Contract versioning becomes more complex
- Unless managed properly, separate repo can become monolithic
- More up front cost is setting up repos and CI/CD pipelines
Microservice Decomposition:
Based on business capability and fine grained based on functional and technical modules.
eg. e-commerce site:
1] Order management --> Order History, Placement, Tracking, Dispute
2] Shopping Cart Management --> Cart Upselling, promotion, calculator etc.
Use cases:
1] A hospital monitoring system, where sensors monitor patients' vitals and raise an alert if these do not match the patient's healthy range of vitals' statistics.
The main microservices you should include are:
Patient Microservice
Vitals Monitoring Microservice
Alert Microservice
2] An airline booking system where customers create an account and book flights , and allocates seating arrangements to customers.
The main microservices you should include are:
Customer microservice
Seating microservice
Flight booking microservice
3] A blogging site where users can post articles on their blogs, and other users can comment on these articles.
The main microservices you should include are:
User microservice
Blog microservice
Article microservice
Comment microservice
Communications:
1. Remote procedure Invocation(RPI) - Mostly Webservice/Rest-API can be used for make calls (Synchronous)
2. Asynchronous Message Based Communication -> When we use message bus (Rabbit MQ/Kafka) where publish and subscribe pattern requirement.
3. Custom or Domain Specific Protocols -> SMTP or FTP can be used based on requirement
Use cases:
1] In a payment gateway system, when we receive a request to withdraw funds from a customer's account, we check if funds are available and process the request if funds are available. An email is queued to be sent to the customer notifying them of the request.
Checking if the funds are available is a synchronous flow, as this must be completed before we can process the withdrawal request.
Since the email is queued (i.e. the system doesn't wait for this to be sent before responding to the payment request) this is an example of an asynchronous flow.
2] In an e-commerce system, when a customer places an order successfully we show a successful response to the customer on screen. Data related to the order is also published to analytics components for business insights.
The response being shown to the customer is a synchronous process and the publishing of the order data to analytics components is also a synchronous process.
However, the analytics components processing this order data for analytics insights is done asynchronously.
Microservice Registry:
Available each instances across project.
On service startup - > Register
On service shutdown -> Remove
At regular intervals -> Health Check/metrics
Microservice Discovery:
Client side discovery -> One of MS request to service instance from other
Server side discovery -> Send request to load balancer endpoint and instance will be created and send back to request
Service Registration & Discovery - Getting Started
Now that we've learned about service registration and discovery, here are some of the most commonly used components that are used to perform this functionality that you can consider using in your microservice environment:
Consul - https://www.consul.io/
A service mesh solution that provides a strongly consistent data store that can be used not only for service discovery purposes but also for health checking (we'll get to this in a later lecture!), as a configuration server and offers multi-datacenter support out of the box. Each of these features can be used independently. Consul is a popular choice not only because of its relative simplicity to set up but also because of its reliability, scalability and supporting features.
Netflix Eureka - https://github.com/Netflix/eureka
Eureka, that is part of the open souced Netflix stack, can be best described as an AWS Service registry for resilient mid-tier load balancing and failover. This is a good alternative if a simple load balancer is needed in a cloud environment, however, for additional features such as monitoring and configuration, you will either have to compliment this with additional tools or look at other alternatives.
Apache Zookeeper - https://zookeeper.apache.org/
Often used with Apache Curator (http://curator.apache.org/), Zookeeper was one of the first tools to be used for distributed service coordination. As a result, it is very mature, robust and has many features available out of the box. In fact, it's used by quite a few big names in the industry such as YouTube and eBay.
However, being one of the older tools out there it's showing its age. Compared to its competitors it consumes significantly more resources and is more time-consuming to set up and maintain due to its dependencies and complexity. You most likely will end up using barely any of the extra features that are available, so you'll be suffering the hit of these disadvantages without any real benefit. As a result, it is a much less common choice nowadays and is usually found in more established companies where alternatives were not available at the time.
Although you may not yet be in a position to actually require the use of a service registry/discovery component, I encourage you to take a brief look into the above components to help you get a better understanding of the functionalities they provide and how they help us maintain seamless communication.
Databases:
1] Shared DB
- Single shared database across all microservices for make any transaction.
- Deadlock and performance issue
- Not good for complex project
2] Multiple DB (Separate DB per service)
- Each MS have its separate schema and DB
- different MS used separate database like Mongo, SQL server, Oracle etc.
API Composition:
Common service can be created which will act as a composer between multiple MS to fetch data from their respective DB and combined/join data based on requirement.
e.g: Both below service mapped to common service as "ProductRecommendation" service as API composer
ProductService --> Product DB
OrderHistoryService --> Order History DB
If dataset is large then its not a feasible solution to implement because of performance degrade.
Event Sourcing:
Use event source, Which will act as message broker publish events and rest of MS subscribe to this source to get update status based on requirement.
e.g.: Keep track of all events by logging datetime stamp by respective services which can be used by other service to check status based on log time.
Event Sourcing - Getting Started!
Event sourcing is rapidly gaining popularity in particular in large companies with enterprise systems that need to handle a high volume of concurrent events, so becoming more familiar with them is definitely a good investment both for your personal knowledge and your future career prospects! Although in the previous lecture we went through the core principles in the previous lecture, we didn't go into technical details. Therefore, I'll now be introducing you to some of the most well-known event stores and event sourcing frameworks to help you get started.
AxonIQ (Java) - https://axoniq.io/
Axon is an open-source Java framework and server that supports event-driven microservices. It provides implementations of the most important building blocks when developing event-driven micro-services following domain-driven design, such as aggregates, command and event buses, and repositories. It is a robust and reliable framework that has a free version as well as an enterprise edition with some extra features, however, the free edition already offers more than you'll need to get started. Specifically for event-sourcing, here's an official article explaining how the Axon Server fits in as an event store when using the Axon framework to implement event sourcing: https://axoniq.io/resources/event-sourcing.
Event Store (All programming languages) - https://eventstore.com/
If you prefer not to use a full framework, or if you just want to build most of the stuff yourself to get a better understanding of what's going on - a good option is to download the Event Store database that was built specifically with event sourcing in mind. It 's free and open-source, with clients for different languages as well as access to functionality via an HTTP web API, making it a viable option to use with just about any programming language.
Event Sourcing in Python - https://github.com/johnbywater/eventsourcing
A well-maintained library that facilitates event sourcing in Python. Apart from an event store, it also supplies features such as concurrency control and snapshotting. The Github repository also contains comprehensive documentation and examples, making it easy to get started.
NEventStore (C#) - https://github.com/NEventStore/NEventStore
NEventStore is a persistence library used to abstract different storage implementations when using event sourcing as a storage mechanism. It contains enterprise-ready features, is very well documented and is actively maintained. It might look daunting at first but it's definitely a viable long term solution.
Equinox Project (C#) - https://github.com/EduardoPires/EquinoxProject
Although not specifically dedicated to event sourcing, the Equinox Project still deserves a mention as it's definitely a good starting point for C# developers. This is an open-source project written in .NET Core, that implements the most commonly used technologies including a good foundation for event sourcing. On the other hand, it doesn't include a fully-fledged event store as NEventStore does, so if you're specifically after an event store then I recommend taking a look at NEventStore first.
If you use any other coding language and would like my opinion on a particular event store/event sourcing library, ping me in the Q&A section and I'll do my best to help out.
Once you've spent some time playing around with an event store, it's a good time to enhance your microservice template with some generic functionality that will most likely be present in multiple microservices. This may include anything from connecting to the event store and writing events, to handling snapshots, recovering from event store network disconnections and more complex features.
Two Phase Commit:
In distributed transaction is more complicated like rollback and commit entire transaction which makes more difficult to maintain consistency of data.
To make data integrity we can use two phase commit method to avoid this scenario.
Phase-1: Commit Request
- Coordinator sends a query to commit message
- Services execute the transaction but do not commit
- Reply YES/NO depending on if trans. were successful
Phase-2: Commit
- if all services replied YES ->
- Coordinator sends a commit message
- Services commit the transaction
- Reply with an acknowledgement
- if at lease one service replied with NO ->
- Coordinator sends a rollback message
- Services rollback the transaction
its a blocking process and resulting deadlock situation.
Saga method:
Sequence of local transactions into their respective DB and make use of message broker.
Choreography based sagas:
Message broker service is connected to each MS and keep track of services and status will make use of publish/subscribe model to overcome two phase commit problem.
Here outcome of one service passed to next service.
Orchestrator based sagas:
Here orchestrator is used to coordinate all service as single point of contact where object creation done. If any service unable to perform trans success then it will terminate trans & not required any step of action.
Use cases:
1] In an e-commerce system, the order information needs to be combined with the list of warehouses to identify the closest location. The list of warehouses is short and can be cached for long durations as it does not change often.
The API composition pattern will fit this scenario well, as the quantity of data being requested from the order microservice and warehouse microservice is small and we can also take advantage of in memory caching for additional optimization.
2] In an e-commerce store, a service needs to combine data for the hourly product and order data. The business is large , with a high number of orders per hour and a large number of products in its database.
Since the volume of data that needs to be loaded by the service is potentially very large, the API composition pattern is not a suitable candidate. A suitable solution would be to use event sourcing, and have this service subscribe to both product and order related events in order to keep an updated view of the required data whilst avoiding bulk data transfers that are both time consuming and also resource inefficient.
Fault tolerance and Monitoring mechanisms:
Interconnected system should be available all time and up/running. Due to number of hardware and software we need proper monitoring system in place which will give data on real time.
Circuit Breaker:
This pattern that will help prevent failures in some part of the network or in a particular microservice from bringing down the entire system.
When a service sends a synchronous request to another service it's possible that the other service is unavailable or under too much load to respond in a reasonable period. As more requests are received, the number of threads blocked waiting for a response from the service continues to pile up and further reduces the chance of disservices recovering.
Additionally, these blocked threads can result in resource exhaustion leading to errors cascading to other services. This can potentially cascade throughout the whole system rendering it unusable.
To prevent such a scenario from cascading to other components, services requesting other microservices in a synchronous manner should do so through a module known as a circuit breaker. This module keeps track of the number of consecutive requests to a service that have failed and if a certain threshold is exceeded any request to the service will fail immediately for a timeout period.
Once the timeout period has passed, a limited number of requests are performed to see if the service has recovered.
If so, then normal operation is resumed. Otherwise the timeout is restarted.
The use of a timeout period avoids continuing to overload the service and improves the chance of it recovering.
It also prevents potential resource exhaustion issues from cascading through out the system.
There is no recommended timeout value, as this depends on normal processing times in a specific
system but as a rule of thumb it is usually set to a few seconds with this value increasing each time
the time of value passes and the service is still unavailable.
The circuit breaker is a cross-cutting model that should be included in all microservices performing
synchronous requests to other microservices and hence we can consider including it as a reference in the microservice template that can be removed if it's not required by the service.
Circuit breakers - Getting Started
Now that we've learned about circuit breakers, you may be thinking: we can just go ahead and code our own to add to our microservice template right?
Well, that's definitely possible but many people have already spent time coding circuit breaker libraries, perfecting them and sharing them with the open-source community, so I don't recommend trying to re-invent the wheel and introducing the risk of new bugs in your system. It's not worth the time and effort - unless you've got a novel idea of course in which case go ahead and share it with the community!
We'll go through some of the more popular circuit breaker libraries and I'll also link a few additional ones that you can take a look at before choosing one for your microservice template.
Netflix Hystrix - https://github.com/Netflix/Hystrix
Part of the Netflix open source stack, this Java library is designed to handle latency and fault tolerance when accessing remote parts of the system, services, or third parties. By applying the circuit breaker pattern, it helps prevent failures in one part of the system from cascading throughout the whole system, making it more robust and resilient to failure. It also allowed for:
rapid recovery real-time monitoring and alerting making use of fallbacks to retain limited functionality in abnormal circumstances
Whilst this was considered one of the most reliable libraries at one point, Netflix announced that they are putting the project on maintenance mode and are moving towards active projects such as resilience4j (https://github.com/resilience4j/resilience4j) rather than continuing to actively improve Hystrix.
Therefore, if you were considering Hystrix I would recommend going directly to resilience4j since Hystrix won't be actively maintained long term.
Sentinel - https://github.com/alibaba/Sentinel
Open-sourced by Alibaba in 2018, this is another mature library that offers similar functionality to Hystrix whilst being a bit more efficient in how it manages resources by avoiding thread pool isolation per dependency which Hystrix does. In practice, this overhead is generally negligible from a user perspective, however. The project is actively maintained and is a good option to consider if you are using Java.
As you can see, Java developers are quite spoilt for choice when it comes to well-known circuit breaker libraries! Let's take a look at some options if you're using other programming languages:
Python
PyBreaker (https://github.com/danielfm/pybreaker) - a circuit breaker implementation that has been around a while and is still actively maintained. It guarantees thread safety and also has some useful additional features such as optional Redis backing.
C#
Polly (https://github.com/App-vNext/Polly) is one of the most popular and reliable circuit breaker implementations that allows the user to describe retry and fallback policies in a fluent manner whilst guaranteeing thread safety. It's well documented and actively maintained.
If you use any other coding language and would like my opinion on a particular circuit breaker implementation, ping me in the Q&A section and I'll do my best to help out.
Once you begin forming your microservice template, I encourage you to include in it the setup and default configuration of your preferred circuit breaker, as it will most likely be used in a significant number of components.
Health Check API:
The health check API is a route that should be available on services that expose an API endpoint. This route can be created to obtain the service status.
The standard response codes that are generally used are 200 - OK
if the service is healthy and the 500 response code if the service is in an error state. However,
custom responses can also be added if you'd like to include more information to the consumers of the health check API.
The health check API is usually called frequently by services such as the service registry to check
if the service is still available to direct traffic to it as we discussed in earlier lectures. It is also commonly used by monitoring services in order to raise alerts if the service queried are in an unhealthy state.
As the health check API is something that will behave similarly across all services and should be included in all microservices exposing an API endpoint, It is a good candidate for a module to be included in the microservices template.
Logging Techniques:
logging techniques to improve the effectiveness of our logging, ensuring that navigating logs is a manageable task and that we have sufficient detail to extract useful information.
In a microservices architecture, a single request can end up going through many microservices. If we are going through the logs of a microservice and find that some data was incorrectly processed,
it is likely that we would need to trace the request to the microservice that sent it and perhaps even to other microservices further up the chain. With just the request payload and time information in hand, although it may be possible to track the originating request to other microservices this would be very time-consuming.
To solve this issue and to be able to trace the path of a request across microservices effectively, we make use of a globally unique identifier
also known as a GUID. As the name implies, a GUID is a unique random value which we assign as the request identifier at the origin of the request as it enters our system as shown in the diagram. We then include this request identifier field when sending related requests to microservices to be able to link them to each other easily.
This request identifier is also included in any logging activity so that we are able to trace the actions related to a particular request across microservices. This request identifier will also be useful to us in calculating the end-to-end performance of how long it took to process a request
in our system and to help identify potential bottlenecks.
Although using a request identifier solves the problem of relating different logs together, navigating through log files is still a tedious task due to the number of microservices instances located on different servers each with their separate log files. To be able to go through logs effectively, we need to use some form of log aggregation technology where we can view and query the logs for all microservices in a single view.
Technologies that allow this functionality include the ElasticSearch, LogStash, Kibana stack, Splunk
and AWS CloudWatch. Using a combination of request identifiers and log aggregation techniques,
it is much easier to trace requests in our microservices system. Many log aggregation technologies also allow us to set up monitoring alerts which are useful to track certain events such as network errors.
They may also allow the user to create dashboards, illustrating graphs with data obtained from the logs that can be used to show statistics and metrics.
Use cases:
1] In a payment gateway, a request to withdraw funds has failed unexpectedly. This was reported by the customer , we have the following data available:
- Customer ID, Amount requested, Approximate time of the request
First off, we will rely on the logging aggregate technology using the customer ID, amount and time as filters to query the logs in order to identify the entry point of the withdrawal request into our system.
Once this has been identified, we can use the request identifier related to this request to trace its flow across microservices to see where it is failing.
2] In a blogging website, a service our error logs indicate that an article creation microservice is failing regularly . You are required to help identify potential causes.
One approach to this scenario would be to query a list of all error logs, use their request identifiers to obtain the original requests as they entered the system and with this data in hand examine the request payloads for common denominators that could immediately pinpoint to the root cause. Failing that, we could get some of these sample payloads that caused the error and replay them to replicate.
3] In an e-commerce system, a customer is not able to register a new account. We have attempted to trace their details in our aggregate logs however there was no trace of this request.
This was partially a trick question, however one that will help train your logical reasoning for practical situations - if there is absolutely no record of the request in our logs then one of the following is the most probable cause and we'll need to investigate each accordingly:
- The request did not make it to our system at all - there may have been some network error or UI error on the customer's browser that prevented the request from being sent
- The logging aggregation technology may not be picking up all logs, this is unfortunately a possibility especially if resources are under heavy usage and hence monitoring on the logging aggregate is also necessary. It would also be a good idea to ensure that logs are being kept for adequate periods and that they are not being discarded due to storage capacity limitations.
Comments
Post a Comment