Reputation: 7160
A forum I frequent was down today, and upon restoration, I discovered that the last two days of forum posting had been rolled back completely.
Needless to say, I'd like to get back what data I can from the forum loss, and I am hoping I have at least some of it stored in the cache files that Chrome created.
I face two problems -- the cache files have no filetype, and I'm unsure how to read them in an intelligent manner (trying to open them in Chrome itself seems to "redownload" them in a .gz format), and there are a ton of cache files.
Any suggestions on how to read and sort these files? (A simple string search should fit my needs)
Upvotes: 35
Views: 206972
Reputation: 13643
The Google Chrome cache directory $HOME/.cache/google-chrome/Default/Cache
on Linux contains one file per cache entry named <16 char hex>_0
in "simple entry format":
If you know the URI of the file you're looking for it should be easy to find. If not, a substring like the domain name, should help narrow it down. Search for URI in your cache like this:
fgrep -Rl '<URI>' $HOME/.cache/google-chrome/Default/Cache
Note: If you're not using the default Chrome profile, replace Default
with the profile name, e.g. Profile 1
.
Upvotes: 8
Reputation: 12757
Note: The flag show-saved-copy
has been removed and the below answer will not work
You can read cached files using Chrome alone.
Chrome has a feature called Show Saved Copy Button:
Show Saved Copy Button Mac, Windows, Linux, Chrome OS, Android
When a page fails to load, if a stale copy of the page exists in the browser cache, a button will be presented to allow the user to load that stale copy. The primary enabling choice puts the button in the most salient position on the error page; the secondary enabling choice puts it secondary to the reload button. #show-saved-copy
First disconnect from the Internet to make sure that browser doesn't overwrite cache entry. Then navigate to chrome://flags/#show-saved-copy
and set flag value to Enable: Primary
. After you restart browser Show Saved Copy Button will be enabled. Now insert cached file URI into browser's address bar and hit enter. Chrome will display There is no Internet connection page alongside with Show saved copy button:
After you hit the button browser will display cached file.
Upvotes: 8
Reputation: 6416
Note: The below answer is out of date since the Chrome disk cache format has changed.
Joachim Metz provides some documentation of the Chrome cache file format with references to further information.
For my use case, I only needed a list of cached URLs and their respective timestamps. I wrote a Python script to get these by parsing the data_* files under C:\Users\me\AppData\Local\Google\Chrome\User Data\Default\Cache\
:
import datetime
with open('data_1', 'rb') as datafile:
data = datafile.read()
for ptr in range(len(data)):
fourBytes = data[ptr : ptr + 4]
if fourBytes == b'http':
# Found the string 'http'. Hopefully this is a Cache Entry
endUrl = data.index(b'\x00', ptr)
urlBytes = data[ptr : endUrl]
try:
url = urlBytes.decode('utf-8')
except:
continue
# Extract the corresponding timestamp
try:
timeBytes = data[ptr - 72 : ptr - 64]
timeInt = int.from_bytes(timeBytes, byteorder='little')
secondsSince1601 = timeInt / 1000000
jan1601 = datetime.datetime(1601, 1, 1, 0, 0, 0)
timeStamp = jan1601 + datetime.timedelta(seconds=secondsSince1601)
except:
continue
print('{} {}'.format(str(timeStamp)[:19], url))
Upvotes: 0
Reputation: 5134
EDIT: The below answer no longer works see here
If the file you try to recover has Content-Encoding: gzip
in the header section, and you are using linux (or as in my case, you have Cygwin installed) you can do the following:
chrome://view-http-cache/
and click the page you want to recoverxxd -r a.txt| gzip -d
Note that other answers suggest passing -p
option to xxd
- I had troubles with that presumably because the fourth section of the cache is not in the "postscript plain hexdump style" but in a "default style".
It also does not seem necessary to replace double spaces with a single space, as chrome_xxd.py
is doing (in case it is necessary you can use sed 's/ / /g'
for that).
Upvotes: 9
Reputation: 648
EDIT: The below answer no longer works see here
Google Chrome cache file format description.
Cache files list, see URLs (copy and paste to your browser address bar):
chrome://cache/
chrome://view-http-cache/
Cache folder in Linux: $~/.cache/google-chrome/Default/Cache
Let's determine in file GZIP encoding:
$ head f84358af102b1064_0 | hexdump -C | grep --before-context=100 --after-context=5 "1f 8b 08"
Extract Chrome cache file by one line on PHP (without header, CRC32 and ISIZE block):
$ php -r "echo gzinflate(substr(strchr(file_get_contents('f84358af102b1064_0'), \"\x1f\x8b\x08\"), 10,
-8));"
Upvotes: 0
Reputation: 5096
EDIT: The below answer no longer works see here
Chrome stores the cache as a hex dump. OSX comes with xxd
installed, which is a command line tool for converting hex dumps. I managed to recover a jpg from my Chrome's HTTP cache on OSX using these steps:
Your final command should look like:
pbpaste | python chrome_xxd.py | xxd -r - image.jpg
If you're unsure what section of Chrome's cache output is the content hex dump take a look at this page for a good guide: http://www.sparxeng.com/blog/wp-content/uploads/2013/03/chrome_cache_html_report.png
Image source: http://www.sparxeng.com/blog/software/recovering-images-from-google-chrome-browser-cache
More info on XXD: http://linuxcommand.org/man_pages/xxd1.html
Thanks to Mathias Bynens above for sending me in the right direction.
Upvotes: 9
Reputation: 1627
Both chrome://cache
and chrome://view-http-cache
have been removed starting chrome 66. They work in version 65.
You can check the chrome://chrome-urls/
for complete list of internal Chrome URLs.
The only workaround that comes into my mind is to use menu/more tools/developer tools
and having a Network
tab selected.
Upvotes: 2
Reputation: 1609
EDIT: The below answer no longer works see here
In Chrome or Opera, open a new tab and navigate to chrome://view-http-cache/
Click on whichever file you want to view. You should then see a page with a bunch of text and numbers. Copy all the text on that page. Paste it in the text box below.
Press "Go". The cached data will appear in the Results section below.
Upvotes: 32
Reputation: 1235
The JPEXS Free Flash Decompiler has Java code to do this at in the source tree for both Chrome and Firefox (no support for Firefox's more recent cache2 though).
Upvotes: 0
Reputation: 2053
I had some luck with this open-source Python project, seemingly inactive: https://github.com/JRBANCEL/Chromagnon
I ran:
python2 Chromagnon/chromagnonCache.py path/to/Chrome/Cache -o browsable_cache/
And I got a locally-browsable extract of all my open tabs cache.
Upvotes: 4
Reputation: 51
I've made short stupid script which extracts JPG and PNG files:
#!/usr/bin/php
<?php
$dir="/home/user/.cache/chromium/Default/Cache/";//Chrome or chromium cache folder.
$ppl="/home/user/Desktop/temporary/"; // Place for extracted files
$list=scandir($dir);
foreach ($list as $filename)
{
if (is_file($dir.$filename))
{
$cont=file_get_contents($dir.$filename);
if (strstr($cont,'JFIF'))
{
echo ($filename." JPEG \n");
$start=(strpos($cont,"JFIF",0)-6);
$end=strpos($cont,"HTTP/1.1 200 OK",0);
$cont=substr($cont,$start,$end-6);
$wholename=$ppl.$filename.".jpg";
file_put_contents($wholename,$cont);
echo("Saving :".$wholename." \n" );
}
elseif (strstr($cont,"\211PNG"))
{
echo ($filename." PNG \n");
$start=(strpos($cont,"PNG",0)-1);
$end=strpos($cont,"HTTP/1.1 200 OK",0);
$cont=substr($cont,$start,$end-1);
$wholename=$ppl.$filename.".png";
file_put_contents($wholename,$cont);
echo("Saving :".$wholename." \n" );
}
else
{
echo ($filename." UNKNOWN \n");
}
}
}
?>
Upvotes: 5