Architecture Pattern: Machine Learning Model as a Service Backend.
Over years many design patterns had been invented and documented. Currently, a new pattern emerges that relies on using a Generalized Machine Learning Model as a core of a Service Backend.
Just to make my point clear, the claim is not about developing a very specialized and single-purpose model that can fulfill one, often very narrow product feature. Solutions like that exist for a long time and are already well-researched, documented, and tested. The best examples can be the recommendation system that companies like Google, Facebook, Netflix, or Amazon are using and many of us are familiar with. The true novelty is the architecture design pattern in which, a generalized machine learning model could be used at the core of the service architecture. A model that supports virtually all of the product’s features. The term support was used here on purpose as opposed to implements, as in this case, the model itself will have little to no code that is tailored for a particular feature and instead will be generalized enough to handle all of the product requirements.

This opens completely new possibilities. Even for completely dynamic product requirements that could be changing without the need for major re-architecting and re-implementation. The core of the model would remain virtually unchanged. Not to try to sound naive or over-optimistic. Today models that are considered the state-of-art require training and then very meticulous fine-tuning so that they achieve an acceptable level of accuracy. Such a process will remain mandatory in the foreseeable future. Yet increasingly so we are moving towards a world where the models themself will be more and more capable and could be generalized to a point in which they could be easily applied to a wide set of use cases.
For the past two decades, multiple Software as Service products had been developed, and many of them had required solving a complex architecture or even distributed system problem to enable a specific set of features. They require a significant amount of time for design, development, and operation. Many of such services were continuously worked on so that they had been gradually improving over time as new features were added, and bugs had been fixed. All the effort that had been put into that development process could be measured in the amount of money, time, and resources that it required to build those products. The ones that become most successful have a never-ending stream of new requirements, improvements, and changes.
What if, all of that well-established software development process were to become a thing of the past? The emerging generalized Large Language Models are opening a possibility not only to use them in various interactive applications but more fundamentally open the path forward for building a generalized computation platform that can become a core component in architecting and developing all kinds of commercial products. Since the release of GPT-3 back in 2020 in the past three years multiple startups emerged that offer their products that had been enabled only thanks to the invention of the Large Language Models.
Yet it appears that there is still a design paradigm from which the industry hasn’t shifted just yet. The current mindset is to develop a product usually in the form of SaaS to which the ML model can be considered as an add-on. In this post, I am going to claim that is already possible today to build a product that at its core has a machine learning model and that model is supporting every single product feature. Such a backend model would be still hidden behind API and as of today still require integration code in front of it. Yet it is the model that performs all of the business-critical logic without a dedicated line of code. That having said the specific feature set supported today is going to be narrow and very specialized, but it’s only a matter of time until the models will be improved and go beyond current limitations.
Arguably while still with quite limited application of text processing those models can be very successfully applied in various tasks that would rely either on classification, summarization, or text generation. A very real example of such application could be a generalized template processing system. Such a system would then be easily adjusted for dynamically changing requirements. It would also be possible to put such an idea to an extreme — such architecture opens the possibility of making the end-user experience fully customizable which will go way beyond any of the initial design assumptions.
What would be the benefit of designing a system in such a way?
- Agility — development speed and time to market does matter. Software development is a time and resource-intensive process today. Usually taking effort estimated in engineering years to be completed. Instead, the ML-powered architecture will allow the product teams to iterate quickly, possibly testing multiple different product variants to find the optimal solution, something that today is going to be expensive.
- Flexibility — the system requirements may not be fully known or finalized by the time the product had been completed and deployed. This is definitely true even today as the software is evolving over time, yet this is time and cost-intensive process. For well establish products it is also becoming harder and harder for introducing significant changes as the complexity of that is constantly growing. Instead, a product built on top of the model could be adjusted almost instantly.
- Unparallel customization — it will be possible not only for the service vendor to provide a set of features but also for the end user to be able to customize the end-user experience. Did a service vendor introduce a breaking change? Not a problem, you as the customer could restore the system behavior to the previous state without impacting any other user at the same time. It might be the case, not every product is going to be built in such a way, but for many, there might be little reason not to do so.
Without a doubt, future startups will emerge that will embrace such architecture choices at the core of their product. This will potentially allow them to quickly benefit and gain upper hand over their competitors, particularly ones that will be too slow to adapt to the new reality.
Yet, we don’t see yet such architecture being fully adapted today and there is good reason for it. The key points could be summarized as:
- Correctness — it’s not a secret that no real-life models achieve 100% accuracy of their results, which might not be acceptable for many applications. Particularly the ones that are considered critical, such applications will definitely not adopt such solution.
- Cost — today the cost of a one million GPT-3 APIs calls with an output of 4,000 characters each, costs 100,000 (one hundread thousand) times more than the cost of invoking AWS Lambda or Google Cloud Function. The models are cheaper if we compared them to the human cost, but still, there is tremendous room for improvement in terms of cost-effectiveness. Today they can not simply compete with existing SaaS products. Though, if history teaches us anything is to not bet against progress.
- Performance — with the increased size of the input a model can take a significant amount of time to process result going in seconds. Still, this will outperform human manual work but will be overall slower than specialized implementations.
A significant amount of research and investment will be needed to improve the capability, performance and cost-effectiveness of the existing models or invent completely new ones until they would be able to fully compete with existing SaaS offerings on all fronts, but even before that will happen they will be able to compete in a very narrow field that today is not fully automated and still be able to be cost-effective and more efficient if compared with the alternative.
While we are still years away from being able to use ML models as fully generalized computation platforms. Despite that, even now is going to be possible to use a generalized ML model to fulfill all of the features of a very specialized product. This should worry software engineers more than using ChatGPT for generating code or instantly solving coding interview questions.