15.2. HTTP Caching

HTTP caching attempts to reuse already loaded web pages or files. For example, if you visit a web page such as http://www.heise.de or http://www.spiegel.de several times a day to read the latest news, then certain elements of that page (for example, the logo image at the top of the page) will not be loaded again on your second visit. Your browser already has these files in the cache, which saves loading time and bandwidth.
Within the Rails framework, our aim is answering the question "Has a page changed?" already in the controller. Because normally, most of the time is spent on rendering the page in the view. I'd like to repeat that: Most of the time is spent on rendering the page in the view!

Last-Modified

Buy the new Rails 5.1 version of this book.

Please modify the times used in the examples in accordance with your own local circumstances.
The web browser knows when it has downloaded a web page and then placed it into the cache. It can pass this information to the web server in an If-Modified-Since: header. The web server can then compare this information to the corresponding file and either deliver a newer version or return an HTTP 304 Not Modified code as response. In case of a 304, the web server delivers the cached version. Now you are going to say, "That's all very well for images, but it won't help me at all for dynamically generated web pages such as the Index view of the companies." Ah, but you are underestimating what Rails can do. ;-)
Please edit the show method in the controller file app/controllers/companies_controller.rb as follows :
# GET /companies/1
# GET /companies/1.json
def show
  fresh_when last_modified: @company.updated_at                               
end
After restarting the Rails application, we have a look at the HTTP header of http://0.0.0.0:3000/companies/1:
$ curl -I http://0.0.0.0:3000/companies/1
HTTP/1.1 200 OK 
X-Frame-Options: SAMEORIGIN
X-Xss-Protection: 1; mode=block
X-Content-Type-Options: nosniff
X-Ua-Compatible: chrome=1
Last-Modified: Wed, 17 Jul 2013 21:50:01 GMT
[...]
$
The Last-Modified entry in the HTTP header was generated by fresh_when in the controller. If we later go to the same web page and specify this time as well, then we do not get the web page back, but a 304 Not Modified message:
$ curl -I http://0.0.0.0:3000/companies/1 --header 'If-Modified-Since: Wed, 17 Jul 2013 21:50:01 GMT'
HTTP/1.1 304 Not Modified 
[...]
$
In the Rails log, we find this:
Started HEAD "/companies/1" for 127.0.0.1 at 2013-07-18 08:27:10 +0200
Processing by CompaniesController#show as */*
  Parameters: {"id"=>"1"}
  Company Load (0.1ms)  SELECT "companies".* FROM "companies" WHERE "companies"."id" = ? LIMIT 1  [["id", "1"]]
Completed 304 Not Modified in 2ms (ActiveRecord: 0.1ms)
Rails took 2ms to answer this request, compared to the 11ms of the standard variation. This is way faster! So you have used less resources on the server. And saved a massive amount of bandwidth. The user will be able to see the page much more quickly.

Etag

Sometimes the update_at field of a particular object is not meaningful on its own. For example, if you have a web page where users can log in and this page then generates web page contents based on a role model, it can happen that user A as admin is able to see an Edit link that is not displayed to user B as normal user. In such a scenario, the Last-Modified header explained in the section called “Last-Modified” does not help.
In these cases, we can use the etag header. The etag is generated by the web server and delivered when the web page is first visited. If the user visits the same URL again, the browser can then check if the corresponding web page has changed by sending a If-None-Match: query to the web server.
Please edit the index and show methods in the controller file app/controllers/companies_controller.rb as follows:
# GET /companies
# GET /companies.json
def index                                                                     
  @companies = Company.all
  fresh_when etag: @companies
end

# GET /companies/1
# GET /companies/1.json
def show
  fresh_when etag: @company
end
A special Rails feature comes into play for the etag: Rails automatically sets a new CSRF token for each new visitor of the website. This prevents cross-site request forgery attacks (see http://en.wikipedia.org/wiki/Cross_site_request_forgery). But it also means that each new user of a web page gets a new etag for the same page. To ensure that the same users also get identical CSRF tokens, these are stored in a cookie by the web browser and consequently sent back to the web server every time the web page is visited. The curl we used for developing does not do this by default. But we can tell curl that we want to save all cookies in a file and transmit these cookies later if a request is received.
For saving, we use the -c cookies.txt parameter.
$ curl -I http://0.0.0.0:3000/companies -c cookies.txt
HTTP/1.1 200 OK 
X-Frame-Options: SAMEORIGIN
X-Xss-Protection: 1; mode=block
X-Content-Type-Options: nosniff
X-Ua-Compatible: chrome=1
Etag: "e57e45d14a0afc4377c81fc5ecc951b0"
[...]

$
With the parameter -b cookies.txt, curl sends these cookies to the web server when a request arrives. Now we get the same etag for two subsequent requests:
$ curl -I http://0.0.0.0:3000/companies -b cookies.txt
HTTP/1.1 200 OK 
X-Frame-Options: SAMEORIGIN
X-Xss-Protection: 1; mode=block
X-Content-Type-Options: nosniff
X-Ua-Compatible: chrome=1
Etag: "e57e45d14a0afc4377c81fc5ecc951b0"
[...]

$ curl -I http://0.0.0.0:3000/companies -b cookies.txt
HTTP/1.1 200 OK 
X-Frame-Options: SAMEORIGIN
X-Xss-Protection: 1; mode=block
X-Content-Type-Options: nosniff
X-Ua-Compatible: chrome=1
Etag: "e57e45d14a0afc4377c81fc5ecc951b0"
[...]

$
We now use this etag to find out in the request with If-None-Match if the version we have cached is still up to date:
$ curl -I http://0.0.0.0:3000/companies -b cookies.txt --header 'If-None-Match: "e57e45d14a0afc4377c81fc5ecc951b0"'
HTTP/1.1 304 Not Modified 
X-Frame-Options: SAMEORIGIN
X-Xss-Protection: 1; mode=block
X-Content-Type-Options: nosniff
X-Ua-Compatible: chrome=1
Etag: "e57e45d14a0afc4377c81fc5ecc951b0"
[...]

$
We get a 304 Not Modified in response. Let's look at the Rails log:
Started HEAD "/companies" for 127.0.0.1 at 2013-07-18 08:32:43 +0200
Processing by CompaniesController#index as */*
  Company Load (0.3ms)  SELECT "companies".* FROM "companies"
Completed 304 Not Modified in 4ms (ActiveRecord: 0.3ms)
Rails only took 4ms to process the request. Almost 10 times as fast as the variation without cache! Plus we have saved bandwidth again. The user will be happy with the speedy web application.

current_user and Other Potential Parameters

As basis for generating an etag, we can not just pass an object, but also an array of objects. This way, we can solve the problem with the logged-in user. Let's assume that a logged-in user is output with the method current_user.
We only have to add etag { current_user.try :id } in the app/controllers/application_controller.rb to make sure that all etags in the application include the current_user.id which is nil in case nobody is logged in.
class ApplicationController < ActionController::Base
  # Prevent CSRF attacks by raising an exception.
  # For APIs, you may want to use :null_session instead.
  protect_from_forgery with: :exception

  etag { current_user.try :id }
end
You can chain other objects in this array too and use this approach to define when a page has not changed.

The Magic of touch

What happens if an Employee is edited or deleted? Then the show view and potentially also the index view would have to change as well. That is the reason for the line
belongs_to :company, touch: true
in the employee model. Every time an object of the class Employee is saved in edited form, and if touch: true is used, ActiveRecord updates the superordinate Company element in the database. The updated_at field is set to the current time. It is "touched".
This approach ensures that a correct content is delivered.

stale?

Up to now, we have always assumed that only HTML pages are deliverd. So we were able to use fresh_when and then do without the respond_to do |format| block. But HTTP caching is not limited to HTML pages. Yet if we render JSON (for example) as well and want to deliver it via HTTP caching, we need to use the method stale?. Using stale? resembles using the method fresh_when. Example:
def show
  @company = Company.find(params[:id])

  if stale? @company
    respond_to do |format|
      format.html
      format.json { render json: @company }
    end
  end
end

Using Proxies (public)

Up to now, we always assumed that we are using a cache on the web browser. But on the Internet, there are many proxies that are often closer to the user and can therefore useful for caching in case of non-personalized pages. If our example was a publicly accessible phone book, then we could activate the free services of the proxies with the parameter public: true in fresh_when or stale?.
Example:
# GET /companies/1
# GET /companies/1.json
def show
  @company = Company.find(params[:id])

  fresh_when @company, public: true
end
We go to the web page and get the output:
$ curl -I http://0.0.0.0:3000/companies/1
HTTP/1.1 200 OK 
X-Frame-Options: SAMEORIGIN
X-Xss-Protection: 1; mode=block
X-Content-Type-Options: nosniff
X-Ua-Compatible: chrome=1
Etag: "81cfb867cac24fad7ff1a7721bfb529a"
Last-Modified: Wed, 17 Jul 2013 21:50:01 GMT
Content-Type: text/html; charset=utf-8
Cache-Control: public
[...]
The header Cache-Control: public tells all proxies that they can also cache this web page.

Buy the new Rails 5.1 version of this book.

Using proxies always has to be done with great caution. On the one hand, they are brilliantly suited for delivering your own web page quickly to more users, but on the other, you have to be absolutely sure that no personalized pages are cached on public proxies. For example, CSRF tags and Flash messages should never end up in a public proxy. To be sure with the CSRF tags, it is a good idea to make the output of csrf_meta_tag in the default app/views/layouts/application.html.erb layout dependent on the question whether the page may be cached publicly or not:
<%= csrf_meta_tag unless response.cache_control[:public] %>

Cache-Control With Time Limit

When using Etag and Last-Modified we assume in the section called “Etag” and the section called “Last-Modified” that the web browser definitely checks once more with the web server if the cached version of a web page is still current. This is a very safe approach.
But you can take the optimization one step further by predicting the future: if I am already sure when delivering the web page that this web page is not going to change in the next two minutes, hours or days, then I can tell the web browser this directly. It then does not need to check back again within this specified period of time. This overhead saving has advantages, especially with mobile web browsers with relatively high latency. Plus you also save server load on the web server.
In the output of the HTTP header, you may already have noticed the corresponding line in the Etag and Last-Modified examples:
Cache-Control: max-age=0, private, must-revalidate
The item must-revalidate tells the web browser that it should definitely check back with the web server to see if a web page has changed in the meantime. The second parameter private means that only the web browser is allowed to cache this page. Any proxies on the way are not permitted to cache this page.
If we decide for our phone book that the web page is going to stay unchanged for at least 2 minutes, then we can expand the code example by adding the method expires_in. The controller app/controllers/companies.rb would then contain the following code for the method show:
# GET /companies/1
# GET /companies/1.json
def show
  expires_in 2.minutes
  fresh_when @company, public: true
end
Now we get a different cache control information in response to a request:
$ curl -I http://0.0.0.0:3000/companies/1
HTTP/1.1 200 OK 
X-Frame-Options: SAMEORIGIN
X-Xss-Protection: 1; mode=block
X-Content-Type-Options: nosniff
X-Ua-Compatible: chrome=1
Date: Thu, 18 Jul 2013 06:55:30 GMT
Etag: "81cfb867cac24fad7ff1a7721bfb529a"
Last-Modified: Wed, 17 Jul 2013 21:50:01 GMT
Content-Type: text/html; charset=utf-8
Cache-Control: max-age=120, public
[...]
The two minutes are specified in seconds (max-age=120) and we no longer need must-revalidate. So in the next 120 seconds, the web browser does not need to check back with the web server to see if the content of this page has changed.

Buy the new Rails 5.1 version of this book.

This mechanism is also used by the asset pipeline. Assets created there in the production environment can be identified clearly by the checksum in the file name and can be cached for a very long time both in the web browser and in public proxies. That's why we have the following section in the nginx configuration file in Chapter 16, Web Server in Production Mode:
location ^~ /assets/ {
  gzip_static on;
  expires max;
  add_header Cache-Control public;
}

Thank you for your support and the visibility by linking to this website on Twitter and Facebook. That helps a lot!