I am proud of the codebase we have at Dyspatch today, and of the road we took to get here. As we rewrite components as microservices, I’ve had time to reflect on that road. My goal with this post is to document our path, thought processes, and goals, in the hope it may help others.
Our code started out as a monolithic django app, running in Heroku with a single Postgres database backing it. This helped with making the product quick-to-market. Once it was out there, we iterated, adding features and evolving the code base. The dataset grew, as did the features, but we tried to keep them maintainable by building components to be: Reentrant, horizontally scalable, and as modular as possible without introducing excess complexity. Reentrant code is the first step toward autoscaling resources – if the code can run in parallel with itself, it can be scaled horizontally. We did this via the dyno API for Heroku, with the work coming in via SQS queues. Backlog in the queue? Add more machines. Queue empty? Reduce machines. Don’t thrash. Done.
This supported us well for several years. As these features grew in numbers, it started to become unwieldy. We started to have concerns with client load time, performance issues with Python, and requirements for specialized infrastructure for specific components. The design pattern we were using wouldn’t support us forever, so we needed to shift. The decision was made to shift to microservices over an extended time period, while building new features. We would build new features/components as microservices when and where it made sense, and when refactoring, we would take existing features/components out of the monolith, and replace them with specialized microservices.
Addressing the client load time, we switched from server-side rendering to a single page app. This allowed us to isolate the css/html/js resources and put them behind a CDN. An advantage to this was the isolation of a concern for the frontend team – a single repo to represent the UI, that could be deployed separately.
The next issue to tackle was a new feature, incoming webhooks. This would receive a lot of traffic. 95% of this traffic is not useful and would need to be thrown away, but that last 5% adds a lot of value if we can show analytics. Writing this in Python wouldn’t make sense, since we’d need a lot more machines to process that much data. Golang seemed like a sensible language for the job, since it is very good for concurrency. The code was simple, it filtered the data, and pushed the desired 5% of data into an SQS queue, for the monolith to consume and process at its leisure. This was our first true microservice. It had a simple interface to provide the limited amount of data needed for the service to know how to filter requests coming in, and would pass the desired data through.
Over the next year, we had two specialized datastore requirements. The first was high sequential writes, the second, mainly updating random access. Both these encompassed their own isolated types of data, so they were built out as microservices that owned the storage, processing, and monitoring of their data. With the high number of calls to each of these services, we needed a sensible communication layer, preferably one that could clearly define the contract between the services. We chose to use GRPC defined by protobuf files. Each client service pulls a protofile and can build the latest client. The protofile lives in the service, which guarantees backwards compatibility to any version currently in use. GRPC provides some extra speed in comparison to our restful alternative, since the SSL handshakes don’t happen every call, and a bi-direction stream is opened up, where each message is serialized to a binary format before being sent across the wire.
At this point, we are building major new features in their own services, and we’ve gone on to replace several large components of our code base, isolating the concerns for that portion of data and processing to its own microservice. But we still have more to do. The point where we have removed all data and processing layers, and only the business logic remains in the view functions, is on the horizon. I believe the next steps will involve a slow shift into a graphQL service, migrating each frontend component, one at a time, to depend on the graphQL data. Then once all systems have shifted over, the dream will be fulfilled: finally retiring the monolith. This is, I believe, an attainable goal.