Spring Cloud: Zuul Trie matcher

Goal

This time I would like to focus on the Spring Cloud Zuul proxy internals in terms of functionality and performance. If we would take a look at the ProxyRouteLocator#getMatchingRoute, this is the method used for finding route per each Zuul proxy call, it has two characteristics, first of all it iterates over list of all services to find the first one matching, second of all it stops when it find the first path that matches the request URI.

If we would attempt to determine the method running time and denote N – as the number of routes and M – as the maximum length of the configured URI path we can deduct that the running time of getMatchingRoute is O(N M). Now that is not bad. But we can do better. There is a very good know data structure that can help us with this goal, a Trie tree. Though a Trie can be represented in multiple ways, we are going to talk here mostly about the simple R-way tree structure that allows to associate a string key with specific value, similary as hashtable does. To simplify a Trie node contains the optional node value and list of links to the children nodes. The actual link to children is being associated with single character, this is why we end up with R-way structure. The down side is that we need extra memory and depending on how we are going to implement the data structure we may need a lot of  that. Simplest approach, is to use an array of object references, with every entry corresponding to a single character, this also resembles the direct access approach in implementing a hashtable.  Let’s consider what would happen if we would want to use alphabet of 256 ASCII codes – we would end up with in 64 bit JVM with 8 bytes per reference and overall 2 KB per node. For unicode this works worse then that with 65536 unique values we would need 512 KB per node.

The direct access technique would be great if it wouldn’t require such large amounts of memory. We could notice that in this specific case we don’t need to handle so many values since in case of URI the RFC3986 clearly states which characters are allowed. Though we can still do better then that. Instead of using a array to store the links why not to use a JDK Map? The Java’s HashMap is hashtable that uses separate chaining for conflict resolution and has default initial capacity of 16 (though this is implementation detail). That would save us a lot of memory for sparse nodes – those with few number of children.

We don’t have to stop there yet. There is one problem with general purpose Map, if we need to store the primitive char type as the map keys – our only option is the wrapper Character class. This will yield at least 4 times more memory only for storing the reference to the Character objects on heap then a actual primitive char array. Fortunately there is 3-rd party implementation of primitive type collections: Trove. The project might be a bit stale, with over 3 years since the last release, but that doesn’t really mather if we can use the existing functionality. The library defines TCharObjectHashMap that in contrary to JDK HashMap implements the open addressing approach and probing for finding the place for the entries. Also on the web you can find interesting comparision of different collection classes.

Implementation

Considering the above introduction I’ve put that into practice and prepared an actual implementation:

https://github.com/jmnarloch/zuul-trie-matcher-spring-cloud-starter

With fallowing Trie implementations:

  • CharArrayTrie
  • HashMapTrie
  • CharHashMapTrie

So what are the gains due to replacement of the standard Spring Clouds Zuul route matching implementation? Our worst case running time is now O(M), where we have defined M – as the maximum length of the URI path, which means that we aren’t bound at all to the configured number of routes, the algorithm will perform the same fast in every case no matter whether you have tens or hundreds routes defined behind your edge gateway.

This is one of the best examples of time vs memory tradeoff I can think off.

Other considerations

The side effect of using the Trie tree is also that now we can find the best matching paths. For instance let consider two examples:

  • /products/**
  • /products/smartphones/**

Now we have a common prefix for both routes that is  the /products/ prefix. The standard implementation for route /products/smartphones/iphone could match either /products/** or /products/smartphones/** depending on the order in the property file, the Trie tree will match always the route that best matches your request URI.

Plans

At this point in time the extension is kept separately and alongside of the previous Cassandra store and Zuul route healthcheck modules that I’ve done and it does not really go along with them. In meaning that there is no way to have both Cassandra with Trie matcher enabled at this time. My goal would be to at least introduce the basic abstraction through pull requests in Spring Cloud Netflix that would allow to externalize and configure parts of Zuul integration and afterwards release the updated version of the modules.

One comment

  1. Pingback: This Year in Spring, 2015-IT大道

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s