If you are working with Microservices and database per service then SAGA will be a topic for you to think about. With database per service pattern every service has its own database.
Think about you have a Microservices e-commerce application. From the user perspective the user wants to pay for the products in her shopping cart. It seems like a single operation/ transaction. On the background there are multiple transactions they should be combined into a single larger transaction.
First of all we need to understand what does ACID mean in database systems.
Atomicity means a transaction is succeeded or failed. There is no partial success or fail.
Consistency ensures that a transaction can only bring the database from a valid state to another valid state. Data must be valid according to all defined rules like constraints, triggers, cascades.
With database systems transactions usually run concurrently. These transactions should run in isolation. On the other hand some databases have isolation levels that can be configured.
Durability means that if a transaction is committed then it is guaranteed that the data is persisted even if a system crash happens after the commit phase.
Traditional Solution: Two Phase Commit (2PC)
I will not go in details here but there are some drawbacks of 2PC
- It is not available with NoSQL databases
- It is a blocking protocol
Now what is SAGA?
SAGA is a design pattern to solve distributed transaction problem. It is introduced in 1987 to manage long-lived transactions. While working with Microservices and database per service SAGAs come to rescue. Because of there are multiple database instances SAGAs are ACD(Atomicity, Consistency, Durability) without Isolation. SAGA can provide a consistent state but it is an Eventual Consistency. This means that the system go in a consistent state but in a period of time.
When there is lack of isolation transactions can effect other transactions. To improve the isolation there can be some improvements to apply:
- Commutative updates: Think about we have a stock amount of 500 and during the transaction decreasing from stock by 1 will not effect other transactions. If the SAGA fails and a rollback occurs it will increase the amount by 1. The amount does not need to be 499 during rollback, even if another transaction makes the amount 480 or whatever it is. Increasing or decreasing amount improves the isolation.
- Holding state: Holding state for records can improve isolation. Setting the state of the record as PENDING means that the SAGA transaction is not completed yet.
- Version number: Manage versions of records mostly a good practice for optimistic locking.
There are 2 common ways to implement SAGAs. One is Choreography and the other is Coordinator.
Choreography based SAGA
With choreography microservices publishes events and the other participants subscribe those messages. In our e-commerce example Delivery service subscribes for stockReserved and paymentDone events and publishes deliveryReserved and deliveryDone events. The other systems or microservices know what to listen and what to do with them. That’s why it is named as choreography. With simple SAGAs choreography can be a good choice but when the system go more complex it will be harder to maintain and debug.
Coordinator based SAGA
With coordinator there is a central logic which tells when and what to do. The coordinator tells Stock, Delivery and Payment services what to do and listens responses from them. Coordinator seems to be more complex or harder to implement but it is much more easier to maintain and the debug the state of the transactions.
What happens when there is a problem with the transaction?
Traditional databases do rollback when there is a problem with the transaction. It is similar with SAGA but it is not a real rollback. It is a compensating transaction. Think about a microservice gets a message to insert a record. Unlike traditional databases with SAGA the microservice needs to get another message (compensating action) to delete that record. It actually inserts the record and deletes while doing rollback.
Choreography based service should catch the error and publish a compensating action. With coordinator based system this is done by the coordinator.
Frameworks to solve the problem
There are frameworks to solve Microservices issues. Eventuate, Axon and Microprofile-LRA are some of them. They are all great! But there are not very big communities behind them. Axon seems to be a more complete solution but as we are in 2020 still you may think about to develop your own solution.