There is a fundamental conflict between services and data systems. Microservices are designed to encapsulate data within their inner workings, data systems are designed to expose data.
In a microservices based architecture an application is split into fragments or microservices which are separated by a network and collectively collaborate and work together to achieve the same goals as a monolithic application.
Microservices encapsulate distinct code and logic and manage their own data, they allow for easier scaling in both performance and organizational terms, and their biggest benefit, if done correctly is their capability to be deployed independently of one another.
There are several anti-patterns that can lead to microservices being coupled to each other and therefore causing them to lose their most important benefit – the ability to be deployed independently.
Having a shared package amongst microservices is an anti-pattern since making changes to the shared package might require several microservices to make changes to their inner workings and their interactions with this shared package, hence not allowing the microservices to evolve and be deployed independently of each other. To counter this scenario, it is important to have well defined responsibilities and clean separation between services, developing contracts that clearly define the roles of a specific service. If a new requirement does not fit into any of these contracts, then it is probably a good candidate for a new microservice.
In a microservice architecture, each microservice will own and manage its own set of data. If there is a reporting need for a larger set of data, which is an aggregation of all the data managed by all the microservices, then the data from these microservices is gathered into a central location where reporting needs are met. In such a setup, each microservice has different ways of representing the same data structure. Over time these data structures, which define the same concepts in the overall system, can diverge. Copying and moving around all this data in a distributed system, where essentially there is no single source of truth, can cause problems.
We should design systems with flexibility in mind as there is always the possibility that a shared set of data between these fragments becomes a requirement. This can potentially lead to coupling services by using a shared data source where data manipulation occurs by multiple services. This can affect the agility of microservices.
The introduction of event stream analysis technology such as Apache Kafka and Microsoft Azure Stream Analytics allows for a new approach to designing microservices that can allow us to avoid these common pitfalls.
Using event sourcing, event stream analysis platforms and Implementing a distributed log, we can avoid the coupling caused by having a single source of truth, where data manipulation and querying takes place by multiple services, and allow the microservices to grow independently.
In such systems all data is first treated as an event, and all events are stored in a distributed log managed by stream analytics services. Essentially this means that there is a single source of truth for all the data and events that can be tapped into, both for reporting purposes and business requirements. The microservices that need to access this data have no control over it and can only query, react to or copy the data into their own storage mechanisms.
With this setup we can ensure easy scalability as the events are stored in a distributed log. We can also make sure that the separation of concerns between the microservices is held, as none of the microservices can modify the data in a way which would cause discrepancies. As a result we have a single source of truth which can be analysed at any time both for machine learning purposes or business reporting. However, event sourcing isn’t a magic fix for all and so must be properly considered before building a system based on this pattern.