An ETag is not about aliens

So what is this ETag thing. I am working since 1998 in the web business but I came across ETags some weeks ago when some of my customers starting to worry about their perfomace. Don’t get me wrong, out apllication worked – as always – fine, but some pages where slower as the equivalent pages at the websites of their competitors. We checked our monitoring data, it’s always good to know of the webstite performed on yesterday, yesterweek or yestermonth, but were unable to find some issues. So we went from the server to the protocol and the client and checked some static and dynamic pages. As I was taking a closer look at a static webpage I stumbled over a HTTP header which I didn’t know: ETag.

So well, my first blog post with some contentent. I am quite excited and somewhat terrified, I really hope my english will be good enough.

So what is this ETag thing. I am working since 1998 in the web business but I came across ETags some weeks ago when some of my customers starting to worry about their perfomace. Don’t get me wrong, out apllication worked – as always – fine, but some pages where slower as the equivalent pages at the websites of their competitors. We checked our monitoring data, it’s always good to know of the webstite performed on yesterday, yesterweek or yestermonth, but were unable to find some issues. So we went from the server to the protocol and the client and checked some static and dynamic pages. As I was taking a closer look at a static webpage I stumbled over a HTTP header which I didn’t know: ETag.

What is this ETag thingy? Why do only static pages have it? Okay to get this clear, I have to reveal some secrets about our customers setup. It runs on an Apache webserver who delegates the dynamic webpages to a Tomcat servlet container. They dynamic content will be generated there by some servlets and JSP. But this is also the solution. The static page was deliviered by the Apache HTTPD and the server adds the ETag header to all files.

So again what is this ETag thingy? What is it good for? Something I learned some years ago: It’s always a good Idea to go to the source. In this case its the RFC 2616 in Section 14.19 the ETag HTTP header field is defined. But nowadays it’s quite okay if you check Wikipedia first and at least a Wikipedia you will get a link to the RFC.

I confess it took a little while for me to understand what the ETag is good for. The web server may create an entity tag for the requested entity. This tag acts like some kind of content identifier or a kind of version number of the content. The Browser can use this information at a later time, when the entity (URL) is requested again send the ETag value in the “If-Match” or more likely the “If-None-Match” header field. If the browser does so, the server may evaluate the header and send a simple 304 (Not Modified) response instead of a 200 (OK) response and the data of the entity. The whole thing works basically equally to the “Last-Modified” and “If-Modified-Since” or “If-Unmodified-Since” mechanism. So why to we need this redundancy?

I am not that deep into the specification process of the HTTP protocol, but I am working a while with distrubuted web applications. Sometimes you simply need to mess around with the latest modification date of a file to get some rsync-ing right or something like that. In general the bad news is: The last modified date is not that reliable if you want to determine if a file is still the same. So this ETag thingy seems to be a good solution to this.

In real life nothing is this easy. The Apache web server calculates the ETag value out of the INode of the file, the last modified date and the size of the file. This is quite awkward if you have more than one HTTPD instance delivering the same file (i.e. if you use DNS round robin load balancing). It would be a small miracle if the file gets an INode with the same id on the different instances. So the ETag mechanism may get some kind of useless, so that Yahoo! recommends in their Best Practices for Speeding Up Your Web Site to deactivate the mechanism. The logic behind this recommendation is: A useless check done is a waste of time. I am not totally lucky with this but Yahoo! has some kind of point with it.

Luckily you can configure the ETag creation in the Apache web server with the FileETag directive. So may use only the modification time (MTime) and the file size (Size) for the ETag creation. This should work out quite well and if you can keep the last modified date in sync between the your web server instances the mechanism should work properly.

Okay I hope someone else reads this and may find it useful. At the moment I am investigating some ETag and last modified header issues with servlets and JSP. I guess I will try to blog about it soon …

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>