Unlock efficient, scalable service communication with RabbitMQ

Modern digital platforms often require a composable, decoupled architecture; meaning that the platform is made up of a large collection of services (applications) that are each responsible for the business logic of a specific domain or expertise within a domain. Whether or not you are developing a monolith or a composable, decoupled platform, many core functionalities are almost always required for any type of application.

One of the biggest unwanted symptoms for organizations that set out to develop such a platform, is re-inventing the wheel for (common) core functionalities. This not only slows down time-to-market but eats away at the (precious) budget that they would rather spend on customer functionalities that have value. Together with my team, we set out to design and build a reusable foundation for a decoupled platform that contains these core functionalities out-of-the-box.

My team was tasked to come up with different types of core functionalities, many are non-functional in nature but aim to increase time-to-market and operational stability. Examples are Observability (logging, monitoring, E2E-tracing and pro-active alerting), Database schema & data integrity management and CI/ CD automation. The key to any successful decoupled platform is a robust integration architecture that allows decoupled services to communicate effectively with each other.

Task at hand

The task at hand was to come up with an integration architecture and pattern that allows services to easily integrate their functionalities with each other whilst staying decoupled and independently operable. This is one of the biggest challenges of decoupled platforms because of its complexity.

We needed to keep dependencies between services to a minimum to allow for decoupling so point-to-point connections between services were not an option. When platforms grow, point-to-point connections will eventually lead to complex (circular) dependencies and other types of unwanted issues/ constraints. We needed to avoid that at all costs. That left us with 3 options:

Use a Message Broker
Use an Event Bus
Use a combination of both

After careful consideration we decided to split up the integration architecture into 2 capabilities; the ability to give “commands and queries”, and the ability to share (data) state via “events” between services. I took responsibility to develop a reusable module that services can import to allow them to easily communicate “commands & queries” with each other via a Message Broker.

Course of action

After close examination of Message Brokers, we decided that RabbitMQ was the best fit. It is open-source, (cloud)vendor agnostic and is widely used and supported. My course of action was to develop a shared module based on AMQP-0.9.1 that initially supports the Pub-Sub pattern and RPC pattern (required to support the synchronous request and response pattern for external facing synchronous APIs). My team mate - working on Observability - shared his requirements to allow for tracing over our decoupled services. This meant my code should not only allow services to communicate with each other, but also carry the tracing context between services (propagation).

I set out with the development of a shared module in TypeScript that can be imported in other services. I started adopting TypeScript as a skill earlier which you can read more about here. We already used AI in another project to translate code from TypeScript to other languages like Java and GO for future compatibility if that is ever required so TypeScript was a safe decision.

The module exports:

A RabbitMQ Client instance with configuration options for connections
Methods to produce and consume messages for Pub-Sub and RPC easily by abstracting complexity
Methods to add and get the propagated context to fulfill the Observability requirements
Methods for the lifecycle management of RabbitMQ exchanges, queues and routing keys, including their dead letter equivalents

I defined a base schema for messages that included the overall schema of message properties, custom headers and their payload. I integrated schema validation and global error responses in all producer and consumer methods. Schema validation for requests and responses is extremely important, especially in an asynchronous, decoupled architecture. This ensures messages meet the agreed interface schemas to avoid critical issues and errors. We publish the (versioned) message schemas to an AsyncAPI catalog so that developers know how to send messages between services.

The outcome

The results of our efforts with RabbitMQ at its core are amazing!
We managed to:

Enable fast, high-quality and traceable communication between services
Enhancing the platforms responsiveness, availability, and scalability
Decrease our response times drastically (below 10ms on average)
Decrease our number of errors to a minimum with the schema validation and error handler
Ensure (persistent) message delivery, even when consuming services are down

The use of the dead letter capabilities of RabbitMQ provides the insurance that services are always capable of processing messages, even if they are not processed correctly on the first try. Data loss is therefore eliminated and services remain fully decoupled. Allowing the propagated context for Observability to be sent with messages, means that the platform allows for tracing and alerting across all services of the platform. (More on Observability with OpenTelemetry to come in the future).

Looking back

Developing a module and patterns like this was something that we completely underestimated. This module and its patterns are at the heart of the platform since it handles the majority of all data flows. It is absolutely vital you get the core functionalities and patterns right. Any error in this module or its patterns has a great impact on the majority of modules and services of the platform that cannot be easily rectified. It is also very important to understand all the ins-and-outs of RabbitMQ. This includes understanding and fully utilizing the advantages of how exchanges, queues, vhosts and authorizations work under the hood. I am very proud that we were able to complete this very complex challenge since we now have a stable integration module in the platforms foundation!

Tags:

TypeScript, Full Stack Development, Microservices, NodeJS, Open Source, Backend Development, Design Patterns, RabbitMQ, AMQP, OpenTelemetry, E2E Tracing