On Autonomy, Codfishes & Functional Monoliths
PG Wodehouse contrasts the paternal annoyance of the absend minded peer with the cheerful insouciance of a codfish. A humble codfish lays a million eggs in one go – In fact, according to www.ucd.ie/codtrace/codbio.htm a female codfish can lay upto 2.5M eggs a year. When confronted with the prospect of suddenly becoming a parent to over a million offspring, a codfish does not share Lord Ermsworth’s fear or disdain. It instead, cheerfully resolves to love them all equally. How does it manage? One possible explanation is that the codfish is able to maintain its stoicism since it not besieged by requests from its offspring for money or food every minute of the day. So the key to “mothering” millions of offspring is to make them all responsible for their own destiny – in short to make them autonomous.
Ops teams in big companies are in the same quandary. They don’t mind “mothering” multiple apps. In fact they gear themselves up to support multiple apps by purchasing expensive and extensive monitoring and self-healing tools. They focus on platform observability. They hire Site Reliability Engineers (SREs) – one of the new fancy titles that engineers like to go by these days.
Though they are geared up to support multiple apps, they want these apps to be autonomous to an extent. They want these apps to be more “codfish”esque and less “Freddie”ish. No barraging with constant requests for more memory, back-ups and what not. They want these to be automatically handled so they can concentrate on better things. Ops people need to spend more time optimizing and automating than doing grunt work. In fact, many people recommend that the ops folks spend less than 40% of their time in doing routine grunt work and the rest on innovation and automation. This is obviously not possible if each of your app is bothersome to monitor and maintain.
Ultimately, the best behaved app is the app that does not exist! So the app teams would ideally want to minimize on the number of apps that they have to maintain.
Micro Services & App Proliferation
Micro Services came in to solve a few problems. However, they are creating a lot of new problems of their own. One of them is that of app proliferation.
Imagine a big organization that has decided to break existing monoliths into a zillion micro services. Most organizations have anywhere in the ballpark of 200-400 Micro Services. In fact, it is not hard to find organizations with over a 1000 Micro Services. If they need to be hosted in their own apps, we will suddenly proliferate from one app to over 400 apps potentially.
That would be fine if the apps are codfishes. But as we know most apps are Freddie Threepwoods. They barrage the Guv’nor with constant demands. They need to be monitored for exceptions. Back-ups must be taken. Memory needs to be increased.
Imagine a company that has an ops team maintaining about 5 monolithic applications. In a matter of days, the application mitosis process (if I am allowed to borrow a term from Cell Biology) starts and before they can say WTF they would have like a hundred apps to monitor. One application is running out of disk space whilst the other one is having too many NPEs (Null Pointer Exceptions). How will they deal with this menace?
One obvious solution is to make the Micro Services teams autonomous. In addition I would make one recommendation which is the chief subject of this article.
Make Micro Services sufficiently coarse-grained. In short, don’t create nano services. Consolidate one or more Micro Services into mini monoliths.
I can imagine the outraged cry from some people. They might say, “You are telling me that I should create monoliths again? Haven’t we learnt from history?”.
So to peel this a little bit. let us delve briefly on Micro Services and how they came about.
On Conways’s Law
A person named Mel Conway presented a paper in 1967 on “How Do Committees Invent?”. In that he stated that the artefacts produced by a committee mirror its internal structure. If you have a Foo committee and a Bar committee, there would be a Foo artefact and a Bar Artefact created by them. Any attempt to change this situation can create considerable friction in the organization.
In the last few years, most organizations have taken Conway’s law to heart. They started breaking themselves down into small, cohesive functional groups. These groups started to create Services that mirror the org structure. Each functional group started owning the development and deployment of these Micro Services. A new paradigm was born. The SOA wine was re-packaged nicely into the new Micro Services bottle.
Papers were written and videos were produced expatiating the virtues of this new paradigm. Consulting organizations invented a spate of new terms and charged a bomb to implement them. Others invented various deployment frameworks. Terms such as Kubernetes, Terraform, Cloud based deployments, Infrastructure as a service etc. were freely sprayed about in the landscape. We spoke about immutable architectures, auto-scalability etc. ad nauseam. Everyone was happy. Well, almost everyone.
Ops folks were notable exceptions to this elated spirit that seemed to have set in. We can’t really blame them though. They are having a huge proliferation of apps that need to be monitored. These apps did not have the maturity of autonomy. In other words, there are having apps that pretended to be codfishes but are actually Freedie Threepwoods. They bother the “Pater” for every little thing on a daily basis.
App Convergence in Production
All organizations might have multiple pre-prod environments. But they have only one production environment (sic). Hence if there exist multiple apps, they all converge in production and need to be managed by one ops team. By having multiple apps per functional group, we are exposing the ops team to problems that are best dealt with internally within the functional group – Problems such as dependency management between services, service versioning etc. It is too huge a task for one Ops team to handle this across the organization with potentially hundreds of Micro services.
Functional Monoliths – The best of both worlds
Each functional group would probably manage a good 15 to 20 Micro Services. So why do they need deployment independence between the Micro services? Isn’t that taking the mitosis process too far? This has been a question that has been bothering me for quite sometime. I feel that by combining the deployment of 15 to 20 micro services into one app, you would drastically reduce the number of apps that need to be monitored by the ops team in production thereby increasing the reliability of the platform dramatically. It is far easy to monitor 10 apps than 200 apps! At the same time functional scalability and flexibility are achieved.
If the word monolith is repugnant to you as an architect, call these functional monoliths by a cooler word. We used to call them Modular Deployment Clusters(MDC). Besides avoiding the term monolith, MDC is also a TLA (Three Letter Acronym) and we all know how techies love TLAs!
Functional monoliths (or MDCs) provide a lot of advantages:
1. Operational Ease – lesser number of apps to maintain by the ops team.
2. Deployment Ease Lesser number of apps to deploy
3. Easier Dependency Management & Testing : One of the biggest problems with Micro Services is the intricate service dependency tree that needs to be maintained. MDCs minimize the dependency management drastically. This makes testing easier.
4. Right Level of Autonomy – Functional groups indeed start serving as mini organizations within the larger organization. MDCs foster autonomy without over-burdening the ops team.
5. Performance – Micro Services can be big performance hogs. We should be wary of the N+1 selects problem which is too big a topic for me to squeeze into this article. MDCs allow teams to optimize for performance at the functional group level.
6. Library Management – Code libraries are far easier to upgrade in functional monoliths than with individual micro services. Imagine a 100 micro services using one code library. If the library changes, all of them need to be re-tested and re-deployed! Functional monoliths minimize this effort. 7. There are tons more but this article has already become too long.
There are some important considerations and patterns that need to be considered when designing functional monoliths. I would probably dedicate another article for that.
On a more philosophical note
Nature is an awesome teacher. Organisms that need to breed in large numbers (such as codfishes for instance) have the right level of autonomy so as to not overburden the parent with too much of non-sustainable management overhead. By imbibing these principles in the design of organizations, we take a leaf out of nature and allow our human-made organizations to grow more organically.