Wednesday, September 28, 2005

Caching web pages

Majorly two categories:

1. Caching the entire response
2. Caching a part of response

Caching the response

Basically this kind of caching is used for news sites, stock tickers, etc. In this approach, the entire web response is cached for a specific configured period of time. After the expiry time, whenever the request is made, the page is refreshed, cached and served back to the client. This is applicable for static and as well as dynamic pages. It is acheived thru Servlet Filters.

Let us assume, we have a JSP "DisplayInventory.jsp", which displays the inventory details read from the database. And the inventory details have the nature of changing in half-an-hour. In this scenario, we can cache the response from the DisplayInventory.jsp for half-an-hour. The response will be cached during the first hit to this page and will be served from this cache for the further requests. When the request is made for this page after half-an-hour, the page is executed once again and the modified data is read and displayed.

This is quite easy to implement it thru Servlet Filters. Anyhow, most of the application servers have these kind of Filters. My following example shows how to configure this setting for Weblogic Server.

web.xml:
   <filter>

<filter-name>InventoryCache</filter-name>

<filter-class>weblogic.cache.filter.CacheFilter

</filter-class>

<init-param>

<param-name>timeout</param-name>

<param-value>30</param-value>

<!-- Cache expiry setting -->

</init-param>

<init-param>

<param-name>verbose</param-name>

<param-value>true</param-value>

</init-param>

</filter>



<!-- Maps the jsp with the filter -->

<filter-mapping>

<filter-name>InventoryCache</filter-name>

<url-pattern>DisplayInventory.jsp</url-pattern>

</filter-mapping>

Caching a part of response

Let us say, only in couple of places in your JSP, you read data from database or do some calculation to produce the response. You can cache only these parts instead of the entire response. This is normally done using tag library tags. Mark the area in your JSP with the cache tags and specifying the timeout period. It is taken care. Writing caching tag library is easy too. Nevertheless, most application server vendors ship a variety of caching tag library tags.

Following example is for Weblogic Server

Let us assume, this is how the DisplayInventory.jsp is written
Header

.....



Body

.....

Some formatting



<wl:cache name="invCache" timeout="30">



Now, there are some tag lib tags

which is used to read data from database



</wl:cache>



End formatting



End Body

And in some part of the JSP, if you want to clear the cache.
<wl:cache name="invCache" flush="true"/>

There are some more parameters available in both the cases, to fine tune the cache.

Tuesday, September 27, 2005

Cache components features comparison

As I mentioned in my previous post, this is how some of the open source components stand in terms of features.
Featues/Cache            EHCache   JCS   JBoss Cache


Multithreading Support Yes Yes Yes

Memory Limit and Over Yes Yes Yes

flow to Disk

Cache Eviction Policy Yes Yes Yes

Notifies listening to No No No

Database change

Updates the references No No No

Distributed Cache Yes Yes Yes

Shutdown and Restart Yes No Yes

Replication over No No Yes

machines

Asynchronous operations Yes No Yes
Design Considerations

I wouldnt hesitate to wrap whatever the cache component I use in my project. Because, these little components often have the tendency to change drastically. If the component is too small, or sometimes too big, we may need to think, is it neccessary to wrap. For an example, Log4j, I wouldnt wrap, rather decide to live with it. For big component, databases such as Oracle, you may decide to live with it and do not bother to consider the alternative and code accordingly. But not with these cache components. Your project should be able to embrace a new change or new cache component for that matter.

Monday, September 19, 2005

Criteria to choose a Caching Component

I had an opportunity to spend quite a long time working with Caching in one of my previous projects in my previous organization. Basically the work was to remove the in-built caching layer based on Entity Bean cache and replace it with the open source caching system citing performance reasons. So we had to evaluate some of the caching components. But since there was a caching system existing already, we aware of the requirements exactly and based on those we evaluated the caching components. Finally we chose ehcache as our caching component, and the project met the performance criteria quite easily, and it became successful.

Having worked extensively on caching components, had a good idea how to choose/write a good, feature rich and all configurable caching component. If you are about to choose a cache component/write your own, it is good to keep these things in mind.

1. Does the caching component work fine in multi-threaded environment like Servlets?*** Most of the caching components I have come across do support this, but it is always worth checking it, after all it is a very important feature required. Otherwise wherever you are using the cache you end up writing a synchronized block (user lever). If it is supported by the component by default, you dont need to worry about multi-threading since during the update and retrieval of the cache components, the component itself will lock the storage and perform the operation (API level).

2. Does it have a memory limit?*** If you could some how instruct to the component that the cache memory footprint can go maximum only up to 50MB, it will be really good to keep the cache memory footprint intact. It will be more than good if the component can spill over the contents to a file or to a database rather than just saying the cache has reached the limit.

3. Does it expire the cache contents?*** It will be useful if some of the components or the entire cache get expired after some specific period of time rather than living in the memory forever.

4. How good are cache eviction policies?*** During the spill over time, will it simply overflow the contents to the secondary storage whatever comes in, or will it choose, lets say, Least recently used ones to the disk and the newly added one to memory. Is it configurable?

5. How cache element update is taken care.*** The is the most important point to be noted. You may not have problem if the data whatever cached is read-only. But in most scenarios, we would be caching data from database and it will be changing periodically or by user request. If it is periodic, setting expiration time would be helpful, and some cache components even can be made aware of how to refresh the cache component. The ultimate cache update facility! Otherwise, manual update to the cache has to happen.

The update of cache has one more side to it. Lets say, you query a cache element like this:

Client.java

....
Employee e = empCache.getEmployee("3435");
....

After this query, if the cache elment "3435" is updated in the cache, will the client or whoever references the element get notified? What are the mechanisms available for this?

6. How small is the memory footprint? How much memory does the cache component occupy for storing n number of objects compared to other components? You may think, it is completely dependent on the object we create. Mostly, yes, but if the component is designed badly, it will likely to occupy more space than that is required to store the contents. This may not be applicable for everyone, but certainly for memory intense applications.

7. Is it Distributed Caching? If the application mandates, you may need to create a caching system which needs to accessed by remote JVMs.

8. Shutdown and Restart. If you have to shutdown the application and restart, can the cache be stored into a file/database and restored back when the application becomes alive once again? May not be an important feature, but it is good to have.

9. How simple/complex is the configuration? Worth exploring this.

10. Dependency. Though it is always to good to choose the component which has very little dependency with other components, you can ignore some of the dependencies like log4j, commons, etc. Because, mostly, you will be referencing these components already in your project. It is good to count only the new dependencies.

*** - Very important

I am not suggesting that ehcache is the best with all these points, we chose it because it fitted our requirements. Some of these may not be applicable to ehcache.

If you have some other important feature which needs to be looked into to evaluate a caching component, please reply to me.

In my next post, I shall try to evaluate some of the open source cache components against these criteria.