Optimizing for Google Webcache

The Google Webcache is a saved copy of your website, downloaded by the GoogleBot indexer. Visitors can visit the cached version of your pages by clicking on the link shown on the Google search results page next to each site URL.

By default, Google offers the most recently scanned version of your page to searchers through the "Cached" link on the page options menu.

Google provides this alternate copy of your page in case your web server ever goes down, and it is also useful for pages that update often. Most importantly, it gives the user an indication of what the Google engine is basing its search results on.

How Google transforms your page.

Not to confuse users, Google provides a fairly unobtrusive header to the top of cached pages indicating the time the document was last fetched by the GoogleBot, and even includes a link to a "Text-only" version, void of any images or attached CSS stylesheets.

To create this header, Google Webcache makes a very simple code addition to the top of page HTML source:

<!DOCTYPE html>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<base href="http://en.wikipedia.org/wiki/Web_cache">
<div style="background:#fff;border:1px solid #999;margin:-1px -1px 0;padding:0;">
  <div style="background:#ddd;border:1px solid #999;color:#000;font:13px arial,sans-serif;font-weight:normal;margin:12px;padding:8px;text-align:left">
    This is Google&#39;s cache of <a href="http://en.wikipedia.org/wiki/Web_cache" style="text-decoration:underline;color:#00c">http://en.wikipedia.org/wiki/Web_cache</a>.
    It is a snapshot of the page as it appeared on Mar 15, 2015 01:19:22 GMT.
    The <a href="http://en.wikipedia.org/wiki/Web_cache" style="text-decoration:underline;color:#00c">current page</a> could have changed in the meantime.
    <a href="http://support.google.com/websearch/bin/answer.py?hl=en&amp;p=cached&amp;answer=1687222" style="text-decoration:underline;color:#00c">Learn more</a><br>
    Tip: To quickly find your search term on this page, press <b>Ctrl+F</b> or <b>⌘-F</b> (Mac) and use the find bar.<br><br>
    <div style="float:right">
      <a href="http://webcache.googleusercontent.com/search?q=cache:K_eVF5aZQckJ:en.wikipedia.org/wiki/Web_cache&amp;hl=en&amp;gl=us&strip=1" style="text-decoration:underline;color:#00c">Text-only version</a></div>
    <div>&nbsp;</div>
  </div>
</div>
<div style="position:relative">

<!-- THE ORIGINAL DOCUMENT SOURCE STARTS HERE -->
<!DOCTYPE html>
<html lang="en" dir="ltr" class="client-nojs">
<head>
<meta charset="UTF-8" />
<title>Web cache - Wikipedia, the free encyclopedia</title>
<meta name="generator" content="MediaWiki 1.25wmf20" />
<link rel="alternate" href="android-app://org.wikipedia/http/en.m.wikipedia.org/wiki/Web_cache" />
...

The bottom <div> tag is never closed, but browsers will auto-close it - more on that below. In fact, the result of adding body elements outside of your defined <body> tag does not result in properly formatted HTML. However, browsers are engineered to handle these kinds of inconsistencies, and Google seems fine with the simplicity.

Can I remove the header?

Yes, there are CSS tricks you can place that remove the header completely.

If your <body> source doesn't start with a <div> you can place the following CSS to remove the first div tag that comes immediately after the document body. My website code traditionally begins with a <nav> element, so the following works great for me:

/* remove header from Google Webcache display */
body>div:first-child{display:none}


This selector technique would not work well for you if your page begins with, or is encapsulated with a <div> element. But if that is the case, give that first div additional CSS styling, such as:

<body>
  <div id="my-container" class="authentic">
  ...
/* slightly more complicated way to remove header from Google Webcache display */
body>div:first-child{display:none}
body>div.authentic{display:inherit !important}

If done properly, you can safely remove the Webcache header by placing code in your everyday, normal CSS.

What does this mean?

Essentially, your page stays the same. Unless of course your website can't handle being wrapped in Google's last position:relative css-styled div. Like my front page.

Having more issues with CSS

The above CSS completely removes the Google header from my page; however, Google's second <div> is messing up my landing page's 100% height screen fill.

I knew that Google's second <div> had a CSS styling of position:relative. Because this is hardcoded (inline) with HTML, I don't get to override the relative positioning. However, all I really need to do is to add an additional rule of 100% height:

/* isolate the second div from Google Webcache display */
body>div:nth-child(2){height:100%}

My word of caution: Your HTML structure can change, so take care that in your attempt to fix a small problem (such as one introduced with Google hosting a cache of your page), you don't introduce a much larger problem with site functionality in the future. The worst thing you could do in this case is to hide the very important first block on your live site to live visitors.

Don't cache the webpage at all

If you don't like the idea of search engines like Google keeping alternate copies of your site on the web, request that they don't.

In your HTML <head> place the following meta tag:

<meta name="robots" content="noarchive" />

Placing this tag is only a request that search engines not provide links to cached content. It does not force an immediate removal of cached content; neither does it stop other bots from archiving your site online. (Like the Wayback Machine)