14.2. HTTP Caching

HTTP caching attempts to reuse already loaded web pages or files. For example, if you visit a web page such as http://www.heise.de or http://www.spiegel.de several times a day to read the latest news, then certain elements of that page (for example, the logo image at the top of the page) will not be loaded again on your second visit. Your browser already has these files in the cache, which saves loading time and bandwidth.
Within the Rails framework, our aim is answering the question "Has a page changed?" already in the controller. Because normally, most of the time is spent on rendering the page in a view. You can see this really well in the section called “List of All Companies (Index View)”: of the total 85ms, a massive 71.9ms and therefore 80% of the overall time are spent on rendering the view.

Last-Modified

Important

Please modify the times used in the examples in accordance with your own local circumstances.
The web browser knows when it has downloaded a web page and then placed it into the cache. It can pass this information to the web server in an If-Modified-Since: header. The web server can then compare this information to the corresponding file and either deliver a newer version or return an HTTP 304 Not Modified code as response. In case of a 304, the web server delivers the cached version. Now you are going to say, "That's all very well for images, but it won't help me at all for dynamically generated web pages such as the Index view of the companies." Ah, but you are underestimating what Rails can do. ;-)
Please edit the index and show methods in the controller file app/controllers/companies_controller.rb as follows :
  def index
    @companies = Company.order(:id)

    fresh_when last_modified: @companies.maximum(:updated_at)
  end

  def show
    @company = Company.find(params[:id])

    fresh_when last_modified: @company.updated_at
  end

Note

We use @companies = Company.order(:id) instead of @companies = Company.all in order to be able to use ActiveRecord's lazy loading (see the section called “Lazy Loading”).
After restarting the Rails application, we have a look at the HTTP header of http://0.0.0.0:3000/companies:
$ curl -I http://0.0.0.0:3000/companies
HTTP/1.1 200 OK 
Last-Modified: Fri, 13 Jul 2012 12:14:50 GMT
[...]
$
The Last-Modified entry in the HTTP header was generated by fresh_when in the controller. If we later go to the same web page and specify this time as well, then we do not get the web page back, but a 304 Not Modified message:
$ curl -I http://0.0.0.0:3000/companies --header 'If-Modified-Since: Fri, 13 Jul 2012 12:14:50 GMT'
HTTP/1.1 304 Not Modified 
Last-Modified: Fri, 13 Jul 2012 12:14:50 GMT
Cache-Control: max-age=0, private, must-revalidate
X-Ua-Compatible: IE=Edge
X-Request-Id: 7802f078add46dc372adaec92f343fe2
X-Runtime: 0.008647
Server: WEBrick/1.3.1 (Ruby/1.9.3/2012-04-20)
Date: Fri, 13 Jul 2012 14:27:15 GMT
Connection: close

$
In the Rails log, we find this:
Started HEAD "/companies" for 127.0.0.1 at 2012-07-13 16:29:53 +0200
Processing by CompaniesController#index as */*
   (0.2ms)  SELECT MAX("companies"."updated_at") AS max_id FROM "companies" 
Completed 304 Not Modified in 2ms (ActiveRecord: 0.2ms)
Rails took 2ms to answer this request, compared to the 67ms of the standard variation. This is more than 40 times faster! So you have used around 40 times less resources on the server. And saved a massive amount of bandwidth. The user will be able to see the page much more quickly.
This result was achieved through @companies.maximum(:updated_at) in the controller. We only had to check when the last update was done in the database. As soon as a single company record changes, the value is set to the then current time and the whole web page is delivered once more. With this method, you can also deliver generically generated web pages via Last-Modified headers.

Etag

Sometimes the update_at field of a particular object is not meaningful on its own. For example, if you have a web page where users can log in and this page then generates web page contents based on a role model, it can happen that user A as admin is able to see an Edit link that is not displayed to user B as normal user. In such a scenario, the Last-Modified header explained in the section called “Last-Modified” does not help.
In these cases, we can use the etag header. The etag is generated by the web server and delivered when the web page is first visited. If the user visits the same URL again, the browser can then check if the corresponding web page has changed by sending a If-None-Match: query to the web server.
Please edit the index and show methods in the controller file app/controllers/companies_controller.rb as follows:
def index
  @companies = Company.all

  fresh_when etag: @companies
end

def show
  @company = Company.find(params[:id])

  fresh_when etag: @company
end
A special Rails feature comes into play for the etag: Rails automatically sets a new CSRF token for each new visitor of the website. This prevents cross-site request forgery attacks (see http://en.wikipedia.org/wiki/Cross_site_request_forgery). But it also means that each new user of a web page gets a new etag for the same page. To ensure that the same users also get identical CSRF tokens, these are stored in a cookie by the web browser and consequently sent back to the web server every time the web page is visited. The curl we used for developing does not do this by default. But we can tell curl that we want to save all cookies in a file and transmit these cookies later if a request is received.
For saving, we use the -c cookies.txt parameter.
$ curl -I http://0.0.0.0:3000/companies -c cookies.txt
HTTP/1.1 200 OK 
Etag: "b5f711016cb2e5fce352230e607ceffe"
Content-Type: text/html; charset=utf-8
Cache-Control: max-age=0, private, must-revalidate
[...]

$
With the parameter -b cookies.txt, curl sends these cookies to the web server when a request arrives. Now we get the same etag for two subsequent requests:
$ curl -I http://0.0.0.0:3000/companies -b cookies.txt
HTTP/1.1 200 OK 
Etag: "132c1be24595b9b5f7b2c08b300592b1"
[...]

$ curl -I http://0.0.0.0:3000/companies -b cookies.txt
HTTP/1.1 200 OK 
Etag: "132c1be24595b9b5f7b2c08b300592b1"
[...]

$
We now use this etag to find out in the request with If-None-Match if the version we have cached is still up to date:
$ curl -I http://0.0.0.0:3000/companies -b cookies.txt --header 'If-None-Match: "132c1be24595b9b5f7b2c08b300592b1"'
HTTP/1.1 304 Not Modified 
Etag: "132c1be24595b9b5f7b2c08b300592b1"
Cache-Control: max-age=0, private, must-revalidate
[...]

$
We get a 304 Not Modified in response. Let's look at the Rails log:
Started HEAD "/companies" for 127.0.0.1 at 2012-07-13 18:45:38 +0200
Processing by CompaniesController#index as */*
  Company Load (0.3ms)  SELECT "companies".* FROM "companies" 
Completed 304 Not Modified in 3ms (ActiveRecord: 0.3ms)
Rails only took 3ms to process the request. Almost 30 times as fast as the variation without cache! Plus we have saved bandwidth again. The user will be happy with the speedy web application.

current_user and Other Potential Parameters

As basis for generating an etag, we can not just pass an object, but also an array of objects. This way, we can solve the problem with the logged-in user. Let's assume that a logged-in user is output with the method current_user. The methods index and show would then look like this in the app/controllers/companies_controller.rb controller:
  def index
    @companies = Company.all

    fresh_when etag: [@companies, current_user]
  end

  def show
    @company = Company.find(params[:id])

    fresh_when etag: [@company, current_user]
  end
You can accommodate any number of objects in this array and use this approach to define when a page has not changed.

Combining Etag and Last-Modified

You can also use Etag and Last-Modified together. Here is what it looks like:
  def index
    @companies = Company.order(:id)

    fresh_when :etag => @companies.all, 
               :last_modified => @companies.maximum(:updated_at)
  end

  def show
    @company = Company.find(params[:id])

    fresh_when @company
  end
As you can see, there is an abbreviated form for the show view. That is because @company has a method updated_at. This is then used automatically by fresh_when.

The Magic of touch

What happens if an Employee is edited or deleted? Then the show view and potentially also the index view would have to change as well. That is the reason for the line
belongs_to :company, :touch => true
in the employee model. Every time an object of the class Employee is saved in edited form, and if :touch => true is used, ActiveRecord updates the superordinate Company element in the database. The updated_at field is set to the current time. It is "touched".
This approach ensures that a correct web page is once more delivered.

stale?

Up to now, we have always assumed that only HTML pages are deliverd. So we were able to use fresh_when and then do without the respond_to do |format| block. But HTTP caching is not limited to HTML pages. Yet if we render JSON (for example) as well and want to deliver it via HTTP caching, we need to use the method stale?. Using stale? resembles using the method fresh_when. The example of the section called “Combining Etag and Last-Modified” would then look like this if we use stale? and additionally render JSON:
def index
  @companies = Company.order(:id)

  if stale? :etag => @companies.all, 
            :last_modified => @companies.maximum(:updated_at)
    respond_to do |format|
      format.html
      format.json { render json: @companies }
    end
  end
end

def show
  @company = Company.find(params[:id])

  if stale? @company
    respond_to do |format|
      format.html
      format.json { render json: @company }
    end
  end
end

Using Proxies (public)

Up to now, we always assumed that we are using a cache on the web browser. But on the Internet, there are many proxies that are often closer to the user and can therefore useful for caching in case of non-personalized pages. If our example was a publicly accessible phone book, then we could activate the free services of the proxies with the parameter public: true in fresh_when or stale?. The example of the section called “Combining Etag and Last-Modified” would then look like this if using public: true:
def index
  @companies = Company.order(:id)

  fresh_when :etag => @companies.all, 
             :last_modified => @companies.maximum(:updated_at),
             :public => true
end

def show
  @company = Company.find(params[:id])

  fresh_when @company, public: true
end
We go to the web page and get the output:
$ curl -I http://0.0.0.0:3000/companies
HTTP/1.1 200 OK 
Etag: "d45a37972109e8ccea1160d81a6ff79d"
Last-Modified: Sat, 14 Jul 2012 12:40:25 GMT
Content-Type: text/html; charset=utf-8
Cache-Control: public
[...]
The header Cache-Control: public tells all proxies that they can also cache this web page.

Warning

Using proxies always has to be done with great caution. On the one hand, they are brilliantly suited for delivering your own web page quickly to more users, but on the other, you have to be absolutely sure that no personalized pages are cached on public proxies. For example, CSRF tags and Flash messages should never end up in a public proxy. To be sure with the CSRF tags, it is a good idea to make the output of csrf_meta_tag in the default app/views/layouts/application.html.erb layout dependent on the question whether the page may be cached publicly or not:
<%= csrf_meta_tag unless response.cache_control[:public] %>

Cache-Control With Time Limit

When using Etag and Last-Modified we assume in the section called “Etag” and the section called “Last-Modified” that the web browser definitely checks once more with the web server if the cached version of a web page is still current. This is a very safe approach.
But you can take the optimization one step further by predicting the future: if I am already sure when delivering the web page that this web page is not going to change in the next two minutes, hours or days, then I can tell the web browser this directly. It then does not need to check back again within this specified period of time. This overhead saving has advantages, especially with mobile web browsers with relatively high latency. Plus you also save server load on the web server.
In the output of the HTTP header, you may already have noticed the corresponding line in the Etag and Last-Modified examples:
Cache-Control: max-age=0, private, must-revalidate
The item must-revalidate tells the web browser that it should definitely check back with the web server to see if a web page has changed in the meantime. The second parameter private means that only the web browser is allowed to cache this page. Any proxies on the way are not permitted to cache this page.
If we decide for our phone book that the web page is going to stay unchanged for at least 2 minutes, then we can expand the example the section called “Combining Etag and Last-Modified” by adding the method expires_in. The controller app/controllers/companies.rb would then contain the following code for the method index and show:
def index
  @companies = Company.order(:id)

  expires_in 2.minutes
  fresh_when :etag => @companies.all, :last_modified => @companies.maximum(:updated_at)
end

def show
  @company = Company.find(params[:id])

  expires_in 2.minutes
  fresh_when @company
end
Now we get a different cache control information in response to a request:
$ curl -I http://0.0.0.0:3000/companies
HTTP/1.1 200 OK 
Etag: "d45a37972109e8ccea1160d81a6ff79d"
Last-Modified: Sat, 14 Jul 2012 12:40:25 GMT
Content-Type: text/html; charset=utf-8
Cache-Control: max-age=120, private
[...]
The two minutes are specified in seconds (max-age=120) and we no longer need must-revalidate. So in the next 120 seconds, the web browser does not need to check back with the web server to see if the content of this page has changed.

Note

This mechanism is also used by the asset pipeline. Assets created there in the production environment can be identified clearly by the checksum in the file name and can be cached for a very long time both in the web browser and in public proxies. That's why we have the following section in the nginx configuration file in the section called “nginx Configuration”:
location ^~ /assets/ {
  gzip_static on;
  expires max;
  add_header Cache-Control public;
}

Updates about this book will be published on my Twitter feed.