Microservices architecture allows development to scale horizontally
Industry-adopted best practices for building scalable web applications are always evolving. One of the most significant is the transition to microservices architecture — breaking a large single-service "monolithic" web application into smaller, independently deployable, independently scalable, decoupled web services. This is an architecture that has been proven in large companies like Netflix and pushed by industry experts like Adrian Cockcroft and Martin Fowler.
Cradlepoint is well down the road of adopting a microservices architecture for our SaaS products — specifically NetCloud Manager (NCM).
Outgrowing Our Monolith
When NCM first released, it was written as a monolithic application. A single small team of engineers contributed to this codebase. As is typical in monolithic deployments, we achieved horizontal scale and high availability by replicating this application across multiple virtual machines and using a load balancer to distribute client requests to those machines.
This architecture worked well for a couple of years, but NCM's success drove new requirements to meet business demands.
Requirement | Example |
We needed to be able to scale different parts of our application differently. | NCM holds persistent connections to hundreds of thousands of Cradlepoint routers at all times. This puts a heavy and nearly steady-state pressure on the router connection portion of the application. In contrast, the NCM web service that hosts cradlepointecm.com creates less load and only sees usage spikes during business hours. The router connection service and web service needed to scale independently. |
We needed to be able to update code with zero downtime. | You can accomplish this with a monolith, but our router connection requirements made this a challenge. To update code with zero downtime, we needed to decouple the process that was holding router connections from the process that contained our business logic. |
We needed to scale feature throughput as we hired more developers. | Understanding the ever-growing monolith codebase became a challenge, especially with newly hired developers. |
We needed to empower teams to qualify and deploy code at their own cadence. | Coordinating changes and deployments across multiple teams became a challenge. One team's late feature or critical defect would slow down our entire deployment pipeline. |
Our requirements were not unique. We decided to move to a microservices architecture since it was the industry-standard solution to address these challenges.
Building the Services
There was a lot of literature about how to partition an application into microservices, but unfortunately most of it was targeted to green-field implementations. We already had our monolith, so we needed to evolve our application architecture while maintaining site reliability and adding new application features. Our migration used two approaches:
- We looked at what we could pull out of the monolith. We had to figure which business domains made sense to extract. We looked at our lab's organization and found logical boundaries for services such as Accounts Services, Licensing Services, and Data Pipeline.
- For every new feature, we had to decide whether new code belonged in a new service, an existing service, and/or the monolith. NCM's "Remote Connect" and "Activity Log" features are excellent examples of features that we implemented as new microservices.
The Good and the Bad
There is no one-size-fits-all "best SaaS architecture." A healthy application architecture is about effectively managing tradeoffs. We solved the aforementioned requirements by using microservices, but we inherited new challenges in the process. Most challenges were expected; some were not.
Problems inherent to microservices are well documented and understood. Distributed systems are complex. Network calls fail. State must be replicated. Standardization is hard. Knowing this before we did the transition, we did some things that worked great:
Practice | Benefit |
We use event-driven asynchronous messaging through RabbitMQ. | We reduced service coupling and simplified our distributed architecture. |
We built service framework libraries to standardize messaging, logging, and auditing for all microservices. | We reduced the amount of copy/paste and duplicate coding efforts between service teams. |
Our entire continuous integration pipeline used Docker and Docker Compose. | Any Jenkins worker or developer could build and unit test any microservice without needing to install dependencies. |
Microservices allowed our service teams to operate independently. At first we embraced this philosophy. Each team could choose its own language, test frameworks, and deployment tools. As long as they adhered to the three practices above, we encouraged teams to do what they wanted to get their features out. Our service decoupling made this possible. Unfortunately, as the number of services increased, so did the number of different ways we did things. This presented new challenges that we needed to overcome.
Challenge | Resolution |
Because every service team could do their deployments their own way, we quickly had several ways to do deployments. | Migrate all services to standardized Kubernetes + Helm deployments. |
Because every service team could pick its own language, building framework libraries for multiple languages was going to be a challenge. | Reduce to two language/framework options: Python/Django and Java/Spring Boot. |
Every service needed to authorize inbound calls. | Build a common service JWT and authorization framework. |
Every service needed to expose external APIs. | Build a common API gateway. |
Creating a new microservice was tedious. There is a lot of copy/paste code. It requires a new git repository and CI jobs, DNS setup, etc. | Automate the microservice creation process. |
We cannot fully replace the monolith without significant R&D investment or even a complete rewrite. | Simplify working with the monolith. Change it to use the same microservice standards and libraries. It should follow the same small, rapid development and release cycle as the smaller services. |
Local development was challenging because coding one service usually required installing and running multiple other dependent services. | We first addressed this with Docker-Compose, but now we use Kubernetes. Each developer can have their own one-click-deployed service stack and use tools like Telepresence to write and debug their service in their local development environment. |
Business demands, customer usage, and engineering capacity all affect decisions on when and how we address technical issues. Technical solutions like those listed above are not simple, and we are still working on some of them. Our service teams carefully balance working on new customer features with addressing technical items.
Takeaway
Cradlepoint, like other SaaS companies, has chosen a microservices architecture to build out its cloud application. All architectures have tradeoffs. Microservices have allowed development to scale horizontally; we can create new service teams and increase feature velocity because they can operate in parallel. This comes with an inherent cost of inconsistency and possible duplication of work.
Allowing teams full autonomy has a price at scale, so early standardization is important in reducing future overhead. We successfully use Docker and Kubernetes to standardize our build, test, CI, and deployment pipelines. We use common libraries, asynchronous messaging, and common core services to minimize the complexity of a distributed application.
We still have work to do, and we are looking at new technologies like service mesh to further simplify microservices development and operations. This “Behind the Code” post was intentionally broad so we can dive into specifics in future articles within this series.
“Behind the Code” is a series of blog posts, written by Cradlepoint engineers, about behind-the-scenes topics and issues affecting software development.