Wednesday, March 9, 2011

Optimizing your Rails site with HTTP Caching

HTTP caching is a critical component of the web, it serves as a mechanism to eliminate the need to send requests, or send full responses. It is equipped with robust expiry mechanisms and has strong industry support. Given this you would expect it to be a common part of any web developers repertoire. However far too often dynamic web sites only consider HTTP caching after they have exhausted other caching options, often having had degraded service or outages in the mean time. In this post I will show you how straight forward it is to add HTTP caching to your Rails app, as well as outline some of the advantages and disadvantages of doing so.

So as an example lets say we have a World of Warcraft site, on this site we display information about items:


The items controller backing this page could look something like this:



This use of expires_in is equivalent to: header['Cache-Control'] = "max-age=10800, public, max-stale=0". This header sets three things: how long the cache is valid for, max-age of 3 hours (10800 seconds), that this page can be cached (public), and that max-stale period for a client to cache this document is zero seconds. When we put a reverse proxy cache like varnish between our rails servers and the internet we populate the cache for a page when any user views it.


Here you can see Anne requests /items/1 populating varnishes cache subsequent requests that come before the cache expiry will be served from varnish and not our rails processes.


By serving these pages out of a reverse proxy like varnish instead of rails we increase both our Apdex score and fault tolerance as our rails processes can go down and we can continue serving requests from our reverse proxy.

Using max-age works well for something like a warcraft item page, as warcraft items do not update frequently and even if they are a few hours of out date it's not the end of the world(of warcraft). However what if we wanted to add some dynamic content like comments to our item pages? One option to keep our pages cacheable yet provide dynamic content is to use Segmentation by Freshness.


As you can see here we use progressive enhancement to render the comments inline on the page if javascript is supported. The comments controller backing this would look something like:


Here we are doing a "Conditional Get" with the rails stale? method. If the client has previously seen this version of the comment page we return a 304 Not Modified. Otherwise we render the document for them and append a ETag header. You can also see that we add in a must-revalidate directive to the cache, this instructs the client that it must revalidate the cache after it expires. The difference between must-revalidate and max-stale is that if a client cannot reach the server and the max-stale header is set the cache will return the document with a warning header set. Here is what the flow for caching of the items comment page would look like:

Here you can see Anne does a get on /items/1/comments and populates the proxy cache.

Subsequent requests can use If-None-Match to do a conditional get. Alternatively to ETag one can also make use of the Last-Modified header to cache dynamic content.

When using a reverse proxy like varnish there is some important configuration you will need to do. Here is an example with explanations of a varnish configuration that you could use for our world of warcraft site:


Overall this approach provides us several benefits I would like to emphasize:
Fault tolerance: A reverse proxy such as varnish can continue to serve requests should our app servers stop responding.
Higher Apdex score: As pages will typically be served by varnish your users will be much happier with faster page load times.
Testing: Given their stateless and declarative nature unit testing HTTP headers is generally easier then testing other caching approaches.
Performance: Reverse proxies are VERY fast, it is fair to say that you will get very good milage from HTTP caching with a reverse proxy.
Cost: There is a cost albeit small associated with every page our services generate. Why pay for servers to keep rendering the same response?

HTTP caching is not for everyone or every problem. For example if you deal with sensitive information you would not want to put that sensitive information into a shared cache. If your pages already contain user specific data you will need to apply an approach like segmentation by freshness before you will be able to cache your pages. If you are serving data that is very dynamic and user centric HTTP caching may not also be suitable for you. However if you consider using HTTP caching in earnest you may be surprised by how you can use it.

No comments: