Architecture Pattern: Machine Learning Model as a Service Backend.

Over years many design patterns had been invented and documented. Currently, a new pattern emerges that relies on using a Generalized Machine Learning Model as a core of a Service Backend.

Just to make my point clear, the claim is not about developing a very specialized and single-purpose model that can fulfill one, often very narrow product feature. Solutions like that exist for a long time and are already well-researched, documented, and tested. The best examples can be the recommendation system that companies like Google, Facebook, Netflix, or Amazon are using and many of us are familiar with. The true novelty is the architecture design pattern in which, a generalized machine learning model could be used at the core of the service architecture. A model that supports virtually all of the product’s features. The term support was used here on purpose as opposed to implements, as in this case, the model itself will have little to no code that is tailored for a particular feature and instead will be generalized enough to handle all of the product requirements.

This opens completely new possibilities. Even for completely dynamic product requirements that could be changing without the need for major re-architecting and re-implementation. The core of the model would remain virtually unchanged. Not to try to sound naive or over-optimistic. Today models that are considered the state-of-art require training and then very meticulous fine-tuning so that they achieve an acceptable level of accuracy. Such a process will remain mandatory in the foreseeable future. Yet increasingly so we are moving towards a world where the models themself will be more and more capable and could be generalized to a point in which they could be easily applied to a wide set of use cases.

For the past two decades, multiple Software as Service products had been developed, and many of them had required solving a complex architecture or even distributed system problem to enable a specific set of features. They require a significant amount of time for design, development, and operation. Many of such services were continuously worked on so that they had been gradually improving over time as new features were added, and bugs had been fixed. All the effort that had been put into that development process could be measured in the amount of money, time, and resources that it required to build those products. The ones that become most successful have a never-ending stream of new requirements, improvements, and changes.

What if, all of that well-established software development process were to become a thing of the past? The emerging generalized Large Language Models are opening a possibility not only to use them in various interactive applications but more fundamentally open the path forward for building a generalized computation platform that can become a core component in architecting and developing all kinds of commercial products. Since the release of GPT-3 back in 2020 in the past three years multiple startups emerged that offer their products that had been enabled only thanks to the invention of the Large Language Models.

Yet it appears that there is still a design paradigm from which the industry hasn’t shifted just yet. The current mindset is to develop a product usually in the form of SaaS to which the ML model can be considered as an add-on. In this post, I am going to claim that is already possible today to build a product that at its core has a machine learning model and that model is supporting every single product feature. Such a backend model would be still hidden behind API and as of today still require integration code in front of it. Yet it is the model that performs all of the business-critical logic without a dedicated line of code. That having said the specific feature set supported today is going to be narrow and very specialized, but it’s only a matter of time until the models will be improved and go beyond current limitations.

Arguably while still with quite limited application of text processing those models can be very successfully applied in various tasks that would rely either on classification, summarization, or text generation. A very real example of such application could be a generalized template processing system. Such a system would then be easily adjusted for dynamically changing requirements. It would also be possible to put such an idea to an extreme — such architecture opens the possibility of making the end-user experience fully customizable which will go way beyond any of the initial design assumptions.

What would be the benefit of designing a system in such a way?

  • Agility — development speed and time to market does matter. Software development is a time and resource-intensive process today. Usually taking effort estimated in engineering years to be completed. Instead, the ML-powered architecture will allow the product teams to iterate quickly, possibly testing multiple different product variants to find the optimal solution, something that today is going to be expensive.
  • Flexibility — the system requirements may not be fully known or finalized by the time the product had been completed and deployed. This is definitely true even today as the software is evolving over time, yet this is time and cost-intensive process. For well establish products it is also becoming harder and harder for introducing significant changes as the complexity of that is constantly growing. Instead, a product built on top of the model could be adjusted almost instantly.
  • Unparallel customization — it will be possible not only for the service vendor to provide a set of features but also for the end user to be able to customize the end-user experience. Did a service vendor introduce a breaking change? Not a problem, you as the customer could restore the system behavior to the previous state without impacting any other user at the same time. It might be the case, not every product is going to be built in such a way, but for many, there might be little reason not to do so.

Without a doubt, future startups will emerge that will embrace such architecture choices at the core of their product. This will potentially allow them to quickly benefit and gain upper hand over their competitors, particularly ones that will be too slow to adapt to the new reality.

Yet, we don’t see yet such architecture being fully adapted today and there is good reason for it. The key points could be summarized as:

  • Correctness — it’s not a secret that no real-life models achieve 100% accuracy of their results, which might not be acceptable for many applications. Particularly the ones that are considered critical, such applications will definitely not adopt such solution.
  • Cost — today the cost of a one million GPT-3 APIs calls with an output of 4,000 characters each, costs 100,000 (one hundread thousand) times more than the cost of invoking AWS Lambda or Google Cloud Function. The models are cheaper if we compared them to the human cost, but still, there is tremendous room for improvement in terms of cost-effectiveness. Today they can not simply compete with existing SaaS products. Though, if history teaches us anything is to not bet against progress.
  • Performance — with the increased size of the input a model can take a significant amount of time to process result going in seconds. Still, this will outperform human manual work but will be overall slower than specialized implementations.

A significant amount of research and investment will be needed to improve the capability, performance and cost-effectiveness of the existing models or invent completely new ones until they would be able to fully compete with existing SaaS offerings on all fronts, but even before that will happen they will be able to compete in a very narrow field that today is not fully automated and still be able to be cost-effective and more efficient if compared with the alternative.

While we are still years away from being able to use ML models as fully generalized computation platforms. Despite that, even now is going to be possible to use a generalized ML model to fulfill all of the features of a very specialized product. This should worry software engineers more than using ChatGPT for generating code or instantly solving coding interview questions.

Using Amazon EventBridge for Automated Event Reconciliation

Amazon EventBridge had announced last week support for event Archiving & Replay that primary goal is to help users with disaster recovery and guarantee that the producer and consumers can reconcile the event and through that theirs state.

EventBridge itself by principle should not drop the event delivery, in case of failure by default the delivery is going to be retried up 24 hours. Only after which the event will be finally drop. This behaviour is now also configurable through the RetryPolicy and can be adjusted per configured target.

Besides that the there are still cases in which the event that had been delivered by the service might end up drop on the floor. Some of typical cases for that:

  • Bug in destination Lambda function, that does not configured DLQ.
  • Deleting or disabling the rule.
  • Misconfigured Rule input transformer.

The above are only the scenarios specific to use of AWS, in general there might be use cases in which reconciliation of events is actually part of the business requirements like guaranteeing that the warehouse inventory is up to date or all of the invoices in the given point in time had been process.

Arguably one of the problem whenever an incident happens is to be able to detect that scenario and then recover every of the events that hasn’t been processed. With the former typically CloudWatch alarms on failed invocations, Lambda function execution or missing data points can help in later case the archive created on the EventBridge event bus can be used for replaying all of the events from the beginning time of the incident to restore the consumer state.

This can be effectively done providing that consumer can process the events idempotently as replaying the events might also mean delivering duplicates of previously delivered events.

Though interesting enough the process of replaying the events doesn’t need to necessary done only in cases of failure and rather the service itself can be design in a way that the event reconciliation happens on regular basis in particular every day, week or month. Without event need of manual work. In such cases it possible to build the system that is able to self reconcile. This can be easily done today with actually little efford.

EventBridge allows to configured a scheduled events. That can be run based on cron expression. The event then can be wired to Lambda function that will be responsible for starting the replay. An example application that does exactly that can be found at Github repo.

You can simply clone it and deploy it using SAM.

$ git clone
$ cd aws-eventbridge-replay-scheduler
$ sam build
$ sam deploy --guided

The CloudFormation template has a cron expression that can be used to configure how often the automated replay should be trigger and Lambda function. The logic of the function uses the same the same cron expression to compute the tumbling window to trigger automated replay of events.

The implementation is quite straightforward:

exports.lambdaHandler = async (event, context) => {
    const archive = process.env.AWS_EVENTBRIDGE_ARCHIVE_ARN;
    const eventBus = process.env.AWS_EVENTBRIDGE_EVENT_BUS_ARN;
    const schedule = process.env.AWS_EVENTBRIDGE_REPLAY_SCHEDULE;

    let cron = parser.parseExpression(schedule);
    let replayEndTime = cron.prev();
    let replayStartTime = cron.prev();

    console.log("Replaying events from %s to %s with event time between [%s, %s]",
        archive, eventBus, replayStartTime, replayEndTime);

    await events.startReplay({
        ReplayName: uuid.v4(),
        EventSourceArn: archive,
        Destination: {
            Arn: eventBus
        EventStartTime: replayStartTime.toDate(),
        EventEndTime: replayEndTime.toDate()

    return {};

Some of scheduling examples:

Running the reconciliation every day at 1 AM.

0 1 ? * * *

Running the reconciliation every week on Sunday at 1 AM.

0 1 ? * 0 *

Running the reconciliation last day of month at 1 AM.

0 1 L * ? *

Interestingly enough this allows to also implement a use case of delayed event delivery, providing that the events will be not processed on first publishing.

What are the tradeoff of executing the reconciliation on schedule? The biggest gain is fact that you have a zero touch operations by design and you don’t require a manual operation during an disaster recovery to restore the state, though there will be cases in which they implied additional time needed for event recovery is unacceptable and immediate means for replaying the events is necessary. The down side of continuously replaying the events is additional cost. Though that can be kept in check as the replay is using tumbling window to never replay the events more than once.

Some ideas for future improvements is to make sure that the Lambda function that triggers the replay is maintaining the state so that is guaranteed that replay will be never repeated for the same time window as well in case of failure all of the miss time windows would be covered.

On the EventBridge side an interesting idea would be to allow configuring Replay as Target to trigger it base on the schedule event without need to write code any code.

Spring Boot RxJava 2

Last month the RxJava 2 GA version has been released:

The project has been reworked to support the emerging JVM standard: Reactive Streams

Thanks to contribution from Brian Chung the small side project that I have initially authored: that adds support for returning the reactive types: Observable and Single from Spring MVC controllers has now support for RxJava2.

While Spring itself will support Reactive Streams mostly through it’s own project Project Reactor. RxJava still will have various support through different project. For instance the latest Spring Data project will allow to design the repositories with build in support for RxJava types

From the API level is the most significant change of the RxJava Spring Boot starter is the package change that nows support types from io.reactivex.* instead of rx.*. Besides that the usage is fairly similar.

Simply add the library to your project:


You can use the RxJava types as return types in your controllers:

public static class InvoiceResource {

    @RequestMapping(method = RequestMethod.GET, value = "/invoices", produces = MediaType.APPLICATION_JSON_UTF8_VALUE)
    public Observable<Invoice> getInvoices() {

        return Observable.just(
                new Invoice("Acme", new Date()),
                new Invoice("Oceanic", new Date())

If you looking for more detail description of migrating to RxJava2, here is a comprehensive guide.

Spring Cloud Stream: Hermes Binder


Spring Cloud Stream is a interesting initiative for building message driven application in the widely considered Spring ecosystem. I think that the main idea is ease the usage and configuration to the bare minimum compared to more complex solution which the Spring Integration apparently is.

Altogether Spring Cloud Stream introduces the idea of binders, which are responsible for handling the integration between different MOM at the moment having out of the support for:

  • RabbitMQ
  • Kafka
  • Redis
  • GemFire

For additional information I highly recommend going through the Spring Cloud Stream reference guide.

Allegro Hermes is message broker build on top Kafka with REST API allowing to easily be integrated by HTTP based clients. It also has a rich set of features allowing to pass JSON and binary AVRO messages as well as broadcasting the messages or sending them in batches.

In order to be able to consume it through Spring Cloud Stream we need to provide a dedicated binder that will be able to connect the messages to Hermes.

Fortunately there is one here:


Let’s try to use it in practice, starting from sample project. You may want to first go through the Hermes quickstart guide to set up your environment.

Next we will download Spring Initializr template using httpie.

$ http -f POST type=gradle-project style=cloud-stream-binder-kafka >

$ unzip

Afterwards you can import the project using your favorite IDE.

The first is to do is to replace the spring-cloud-starter-stream-kafka with hermes binder:


Let’s start by configuring the Hermes URI for the binder.

          uri: ''

Now we can design our binding and the POJO used for the message.


import java.math.BigDecimal;
import java.util.UUID;

public class PriceChangeEvent {

    private final UUID productId;

    private final BigDecimal oldPrice;

    private final BigDecimal newPrice;

    public PriceChangeEvent(UUID productId, BigDecimal oldPrice, BigDecimal newPrice) {
        this.productId = productId;
        this.oldPrice = oldPrice;
        this.newPrice = newPrice;

    public UUID getProductId() {
        return productId;

    public BigDecimal getOldPrice() {
        return oldPrice;

    public BigDecimal getNewPrice() {
        return newPrice;

And binding for the message channel.


import org.springframework.messaging.MessageChannel;

public interface Events {

    MessageChannel priceChanges();

Through configuration we can specify, the destination topic name and the default content type of the topic.

          destination: 'io.jmnarloch.price.change'
          contentType: 'application/json'

In order to enable Spring Cloud Stream binding we need to annotate our configuration class.

public class EventsConfiguration {

Using the binding is straightforward, a proper proxy is going to be created and can be afterwards injected.

public class EventsProducer {

    private final Events events;

    public EventsProducer(Events events) { = events;

    public void publishPriceChange(PriceChangeEvent event) {

        events.priceChanges().send(new GenericMessage<>(event));

Finally, we can publish our message:

eventsProducer.publishPriceChange(new PriceChangeEvent(uuid, oldPrice, newPrice));

At the moment the binder itself is still under development, but yet this presents the workable example.

Publishing AVRO binary messages is almost as simple as the JSON ones and I’m going to cover that in fallowing blog post.

Spring Boot: RxJava Declarative Schedulers

As a fallow up to the last weeks article: Spring Boot: RxJava there is one additional project:

Setup as with most Spring Boot starters is fairy simple you just drop the dependency to your project classpath and you are all set:


The library brings one functionality, it allows to specify the Scheduler on the RxJava reactive types: rx.Observable and rx.Single in Spring’s declarative manner – through annotations.

The basic use case is to annotate your bean methods with either @SubscribeOnBean or @SubscribeOn annotations.


    public class InvoiceService {

        public Observable<Invoice> getUnprocessedInvoices() {
            return Observable.just(

The motivation here is to ease the integration with Spring Framework and be able to define within the DI container the application level scheduler. Why you want to do that? There are a couple of use cases.

For example you might need to provide a custom scheduler that can be aware of ThreadLocal variables, a typical use case is to pass logging MDC context, so that afterwords the thread running within the RxJava Scheduler can access the same context as the thread that triggered the task, but the applications go beyond that.

Other typical example is for instance customize your scheduler, not relaying on the build in. In order to for instance to limit the thread pool size, considering that the build in schedulers like IO scheduler are unbounded.

In case you want to simply relay on the RxJava predefined schedulers you can still use them with @SubscribeOn annotation.

    public class InvoiceService {

        public Observable<Invoice> getInvoices() {
            return Observable.just(

Spring Boot: RxJava

Back to posting, this will be a bit old since I had been working on this integration back in the February.

Interestingly enough I had already prepared a blog post related to this feature within Spring Cloud that added tight RxJava integration with Spring MVC controllers. Remarkably it turn out that the implementation in one of the previous milestones had a flow within it.

I’ve been interested in trying to find a solution towards the problem, though I was keen to support mostly the widely used REST like approach in which (mostly) the entire payload is being returned upon computation, in contrast to streaming the response over HTTP. This approach has been reflected in this small project:

Which work out very well as a reference project, in which I had opportunity to try out different API implementations. On it’s own you can use this in your own project, since the proposed implementation depends only on the Spring Boot and Spring Framework’s MethodReturnValueHandler so if you simply using Spring Boot without additional features provided through Spring Cloud feel free to test it out.

Later the code of project become a baseline for the implementation proposed to

Spring Cloud approach

The final approach that has been implemented in Spring Cloud is a bit different, first of all the support for rx.Observable has been removed, instead you can use rx.Single in similar manner like DeferedResult, to which the underlying implementation in fact maps the RxJava type. The reference describes this a bit more in detail:

Funny enough at my company one of my colleagues had to later on migrate the code of one the projects from the rx.Observable to rx.Single, which he wasn’t really happy about 😉

Spring Boot: Tuning your Undertow application for throughput

It’s been some time since the previous blog post, but finally I though that it’s a good time to make a post about very useful and practical aspect. How to prepare your Spring Boot application for production and how to guarantee that it will be able to handle a couple of millions of views each day.

If you think that you have already made all the needed steps by making your application stateless, scaling it out or running it on the high end machine, think twice because it’s quite likely that there are some bottlenecks inside your application that if not treated with proper attention would most likely degradate the performance and the application overall throughput.

Tuning for latency vs tunning for throughput.

Interesting enough in the past, being aware of the Little’s Law I have thought that tuning your application throughput requires nothing more then reducing your application latency as much as possible. It was just after reading the book Java Performance the after I realized that might not be true in all of the cases.

Generally you can improve the latency first by improving your application algorithmic performance, after that you should take a look on access patterns in your application introducing a caching layer or redesign the way your application is accessing the data can have huge impact on the overall performance. If your application is heavely I/O bound performing operations in the parallel can be a way to improve things a bit.

Also a good idea for improving you application latency is to configure asynchronous logging whether you using Logback or Log4J2, but of them provide proper functionality.

Thread pools


Undertow uses XNIO as the default connector. XNIO has some interesting characteristics apart from the default configuration which by default is I/O threads initialized to the number of your logical threads and the worker thread equal to 8 * CPU cores. So on typical 4 cores Intel CPU with hypert-hreading  you will end up with 8 I/O threads and 64 working threads. Is this enough? Well, it depends. Considerably the Tomcat’s and Jetty defaults are 100 and 1000 threads respectively. If you need to be able to handle more request per second this is the first thing that need to consider to increase.


The Hystrix documentation states that:

Most of the time the default value of 10 threads will be fine (often it could be made smaller).

After working with couple of the projects, I found it hardly to believe that this could be a true statement. The defaults for Hystrix is 10 threads per pool, which quickly might turn out to become a bottleneck. In fact the same documentation also states that in other to establish the correct size of hysterix thread pool you should use the fallowing formula:

requests per second at peak when healthy × 99th percentile latency in seconds + some breathing room

So let’s assume that you have a system that has to handle let’s say 24 000 rps, divided by the number of instances, for instance 8, you can establish the appropriate pool size for single instance. This will vary greatly on the latency of your system.


Memory usage

All of this is not given without a price. Each of the newly allocated threads consumes memory. Through Java you can configure this property through -Xss property with the default for 64 bit VM being 1 MB. So if you let’s say configure your Undertow thread pool with 512 working threads, be ready that your memory consumption (only for allocating the thread stacks) will be increased to that number.

Connection pools


Do you use for instance RestTemplate, or maybe RestEasy JAX-RS client. In fact there is a well known issue reported in RestEasy that uses exactly ONE connection for all of your calls. The good advice is to align that value with the number of working threads of your application server, otherwise when performing the HTTP calls the threads will be waiting for acquiring the underlying HTTP connection from the pool, which will cause unnecessary and probably unintended delay.


The same basic principal applies to any other kind of service that is being communicated over TCP connection. For instance Memcached clients like XMemcache has nice capabilities of using a multiplexed TCP connection with binary protocol on top of it, giving a throughput of roughly 50 requests per connection, though still if you need to be able to handle greater throughput you need to configure your client to maintain a entire pool of connections.

Garbage collection

If you opt for low latency, probably you should consider optimizing the Garbage Collector as the last kind of resort. As much as garbage collection could be optimized through different settings this does not handles the true problem,  if you can address those issue first you should be able to be just find and tune the garbage collector afterwards for the best overall performance.

Final thoughts

Equipped with this practical knowledge how you will be able to tell if your application is your application faces any of those problems, first of all equipped with proper tools. Stress test are one of them, you can either decide to treat the application as a black box and use for instance Gatling to measure the throughput of your application, if you need need more fined grained tools the jmh project that could used for running benchmarks of the individual Java methods. Finally use profiler to understand where your application is spending the most time, is it for instance a RestTemplate call, or maybe your cache access time sky rockets whenever you? A good advice on how to measure the characteristics of the application is to use the doubling approach, run your benchmark with for instance 64 RPS – monitor the results, and repeat the experiment with double the request number. Continue as long as you haven’t reached the desired throughput level.

With all of this being said, the true is that this in fact describes the hard way, there is also a simple and fast path to solve your heavy load problems especially for HTTP:

Use a caching reverse proxy.

Either if it’s Nginx or Varnish, both of them should take the load out of your backing services and if you can decrease your load  you do not need spend so much time on the optimizations.

CompletableFuture cache

This post is going to be more theoretical and rather describe the idea of asynchronous caches or promise caches on the conceptual level.

How to cache promises?

Earlier this year I been working on small service that entire implementation were based on promises – Java 8’s CompletableFuture to be exact. The nice feature that this provides is the possibility to compose multiple asynchronous operations that can be executed in parallel. Though we quite fast found out that even though we have given a very powerful tool we had to give up some others like for instance caching.

One can argue that, there a simple solution for that, implement explicit caching functionality that would require to check if specific value exists in the cache and to simply return it wrapped into a future or otherwise execute the application logic and populate the cache afterwards.

Unfortunately we didn’t find such solution satisfying. I would prefer to have a more subtle solution. I’ve even spent some time on working on PoC of CompletableFuture cache, ending with workable solution (though I was really treating that as a form of exercise and you looking for something, that you would wish to run in the production there is probably better options):

It was shortly after, when I’d discovered that I haven’t been first to came out with such idea and that there are already existing implementation of the caches capable of storing promises.

  • Twitter Util has cache for Twitter’s Futures in Scala
  • Caffeine has AsyncLoadingCache for Java’s 8 CompletableFuture
  • Spray can cache Scala’s Futures
  • RxCache for caching rx.Observables

The whole idea can be generalized and probably named as promise cache or asynchronous cache. If we had to describe what are characteristics of such cache we can mention few:

  • It caches promises rather then plain values
  • It requires to associate a unit of work with the cache value
  • It caches not only the completed tasks results, but also the “running” tasks
  • Gracefully handles the task cancelation

If I had to expand the description we would need to understand that such cache whenever a new entry is being added to it is going to return a promise of the value. So in most cases we are going to provide to it a kind of task to execute in a form of lambda expression or a callable for instance. In the exchange expect that it will return a promise of the result. We can distinguish three different state of the entry in the cache:

  • No entry exist associated with specific key
  • A new entry has been inserted for the key, but is being executed by the thread in the background
  • A entry exist and is a result of the computation wrapped into a promise.

In other words the cache has one in particular interesting characteristics it has to be able to observe the supplied task and “capture” the result of it’s computation in order to store that and return when requested. This has some interesting implications, if we would consider a typical use case for caching, like for instance a database query or long running HTTP request to remote service, the asynchronous cache has one huge advantages, it allows to provide that task for execution once and until it’s being completed every request can observe the promise until it is done. This is going to efficiently using the system resources. In typical scenario for instance when running in web server that gives us a huge advantage over the blocking solution, because we can guarantee that at given time (on a single server) exactly one background thread is executing the given task, but what more important – all of the request accessing the same cache entry can be processed asynchronously and observe the same single task for completion, not blocking the execution.

Let’s take a look at AsyncLoadingCache from Caffeine as an example of API of such cache:

public interface AsyncLoadingCache<K, V> {

  CompletableFuture<V> getIfPresent(@Nonnull Object key);

  CompletableFuture<V> get(@Nonnull K key,
      @Nonnull Function<? super K, ? extends V> mappingFunction);

  CompletableFuture<V> get(@Nonnull K key);

  void put(@Nonnull K key, @Nonnull CompletableFuture<V> valueFuture);


Despite that the API defines well know put/get methods a typical use case of using asynchronous cache would require to call get method with the provided task for execution.

The ability to supply at most one executing task has a huge advantage and can been very useful in situations that for instance one long running task/request could saturate the web server thread pool.  Introducing the async cache could be very helpful in multiple use cases and could be used as pattern in multiple different scenarios, from already mentioned database queries, to even inserting the data and could be easily used to guarantee the idempotency of the HTTP request (at least on the single node). We won’t be duplicating work and executing same task multiple times.

Once the task completes the execution the cache need to intercept such “event” and store the result on successful completion. This can be easily done with CompletableFuture#completedFuture method. In case of an error the entry will have to evicted from the cache.

The problem of global task cancelation

There on interesting edge case, what if promise that has been supplied to the cache hasn’t yet completed processing and one of the clients will request it’s cancellation? Unless this is handled by the cache implementation this might be a very destructive operation, since any other client waiting for the same computation result will be affected by this operation. Unfortunately in case off Java CompletableFuture, this is an existing problem. The JDK 9 will introduce a CompletableFuture#copy method that will give a way to workaround that, but until that time the implementation like Caffeine does not handle such situations gracefully, CompletableFuture does not expose the proper API for such cases.

Beyond single instance

This is going to be pure speculation from my side, but I can imagine moving this idea way beyond caching within single program instance. It would be interesting to see a distributive system build on top of the concept of asynchronous cache in which (with configurable consistency level) it could be possible to guarantee that in the whole server cluster at most one task is being executed for the specific input value. While any other node could observe the task for completion in non blocking manner. This would be surely a idea worth implementing.

Spring Framework 4.3: AsyncRestTemplate interceptors

After a short break I would like to get back with very interesting topic. Not often you have ability to describe one of the upcoming features of widely used libraries like Spring. Last year I’ve co-authored really simple feature that adds to the Spring’s AsyncRestTemplates a very needed extension point: interceptors. So I would like to take here the liberty to describe them more deeply.

This topic might not be so much useful for most variety of use cases of the AsyncRestTemplate, unless you are developing yourself frameworks or libraries and  you are looking for seamless integration. The contract of the interceptor fallows as much as possible it’s RestTemplate’s counterpart.

public interface AsyncClientHttpRequestInterceptor {

    ListenableFuture intercept(HttpRequest request, byte[] body, AsyncClientHttpRequestExecution execution) throws IOException;

The major difference is that instead of returning the response object the interceptor has to work on a ListenableFuture – a observable promise that eventually will return a HTTP response.

The minimal implementation to intercept the response through the interceptor requires to add callback of the ListenableFuture. Example:

public class AsyncRequestInterceptor implements AsyncClientHttpRequestInterceptor {

   public ListenableFuture<ClientHttpResponse> intercept(HttpRequest request, byte[] body,
         AsyncClientHttpRequestExecution execution) throws IOException {

      ListenableFuture<ClientHttpResponse> future = execution.executeAsync(request, body);
            resp -> {
               // do something on success
            ex -> {
               // process error
      return future;

Why does introducing the interceptors is important or anyhow useful? If we would take a look of the existing functionality of RestTemplate that is provided through Spring Cloud Netflix, Spring Cloud Commons, Spring Cloud Security or Spring Cloud Sleuth we can list a bunch of interesting applications:

  • Ribbon client load balancing – this is in fact done through ClientHttpRequestFactory, though ClientHttpRequestInterceptor would be sufficient to achieve the same result.
  • Spring Cloud Security – uses them to add load balancing to the OAuth2RestTemplate.
  • Spring Cloud Sleuth – uses them to add tracing header to the outgoing request.

Some other example use cases:

  • Request/response logging
  • Retrying the requests with configurable back off strategy
  • Altering the request url address

You may expect this functionality available with the release of Spring Framework 4.3 and Spring Boot 1.4. Since open source projects have some inertia in development, any integration build on top of it for instance in Spring Cloud probably won’t be available until the 1.2 release.