Monday, September 19, 2005

Criteria to choose a Caching Component

I had an opportunity to spend quite a long time working with Caching in one of my previous projects in my previous organization. Basically the work was to remove the in-built caching layer based on Entity Bean cache and replace it with the open source caching system citing performance reasons. So we had to evaluate some of the caching components. But since there was a caching system existing already, we aware of the requirements exactly and based on those we evaluated the caching components. Finally we chose ehcache as our caching component, and the project met the performance criteria quite easily, and it became successful.

Having worked extensively on caching components, had a good idea how to choose/write a good, feature rich and all configurable caching component. If you are about to choose a cache component/write your own, it is good to keep these things in mind.

1. Does the caching component work fine in multi-threaded environment like Servlets?*** Most of the caching components I have come across do support this, but it is always worth checking it, after all it is a very important feature required. Otherwise wherever you are using the cache you end up writing a synchronized block (user lever). If it is supported by the component by default, you dont need to worry about multi-threading since during the update and retrieval of the cache components, the component itself will lock the storage and perform the operation (API level).

2. Does it have a memory limit?*** If you could some how instruct to the component that the cache memory footprint can go maximum only up to 50MB, it will be really good to keep the cache memory footprint intact. It will be more than good if the component can spill over the contents to a file or to a database rather than just saying the cache has reached the limit.

3. Does it expire the cache contents?*** It will be useful if some of the components or the entire cache get expired after some specific period of time rather than living in the memory forever.

4. How good are cache eviction policies?*** During the spill over time, will it simply overflow the contents to the secondary storage whatever comes in, or will it choose, lets say, Least recently used ones to the disk and the newly added one to memory. Is it configurable?

5. How cache element update is taken care.*** The is the most important point to be noted. You may not have problem if the data whatever cached is read-only. But in most scenarios, we would be caching data from database and it will be changing periodically or by user request. If it is periodic, setting expiration time would be helpful, and some cache components even can be made aware of how to refresh the cache component. The ultimate cache update facility! Otherwise, manual update to the cache has to happen.

The update of cache has one more side to it. Lets say, you query a cache element like this:

Client.java

....
Employee e = empCache.getEmployee("3435");
....

After this query, if the cache elment "3435" is updated in the cache, will the client or whoever references the element get notified? What are the mechanisms available for this?

6. How small is the memory footprint? How much memory does the cache component occupy for storing n number of objects compared to other components? You may think, it is completely dependent on the object we create. Mostly, yes, but if the component is designed badly, it will likely to occupy more space than that is required to store the contents. This may not be applicable for everyone, but certainly for memory intense applications.

7. Is it Distributed Caching? If the application mandates, you may need to create a caching system which needs to accessed by remote JVMs.

8. Shutdown and Restart. If you have to shutdown the application and restart, can the cache be stored into a file/database and restored back when the application becomes alive once again? May not be an important feature, but it is good to have.

9. How simple/complex is the configuration? Worth exploring this.

10. Dependency. Though it is always to good to choose the component which has very little dependency with other components, you can ignore some of the dependencies like log4j, commons, etc. Because, mostly, you will be referencing these components already in your project. It is good to count only the new dependencies.

*** - Very important

I am not suggesting that ehcache is the best with all these points, we chose it because it fitted our requirements. Some of these may not be applicable to ehcache.

If you have some other important feature which needs to be looked into to evaluate a caching component, please reply to me.

In my next post, I shall try to evaluate some of the open source cache components against these criteria.

No comments: