Reputation: 43440
I would like to download Google Trends csv data using wget, but I'm unfamiliar with using wget. An example URL is:
Opening this with a web browser, I retrieve the expected file. To do this with wget, I tried the following command:
wget "http://www.google.com/insights/search/overviewReport?cat=71&geo=US&q=apple&date&cmpt=q&content=1&export=1" -O report.csv
which results in the following:
<html><head><title>Redirecting</title>
<meta http-equiv="refresh" content="0; url='http://www.google.com/insights/search#content=1&cat=71&geo=US&q=apple&date&cmpt=q'"></head>
<body bgcolor="#ffffff" text="#000000" link="#0000cc" vlink="#551a8b" alink="#ff0000"><script type="text/javascript" language="javascript">
location.replace("http://www.google.com/insights/search#content\x3d1\x26cat\x3d71\x26geo\x3dUS\x26q\x3dapple\x26date\x26cmpt\x3dq")
</script></body></html>
My first guess is that wget
doesn't have access to cookies with proper authentication.
Anybody?
Upvotes: 0
Views: 3327
Reputation: 368251
You are getting a redirect message. The URL in the location.replace
bit and you get a valid index.html
from Google is you that URL in a second call to wget
.
Methinks you simply don't have the proper URL from where the csv data is downloaded. For a working example of how to 'hit' a CGI interface with a downloader, look at R's get.hist.quote()
in the tseries package.
Edit: Here is what get.hist.quote() does:
R> IBM <- get.hist.quote("IBM")
trying URL 'http://chart.yahoo.com/table.csv?s=IBM&a=0&b=02&c=1991&d=9&e=08&f=2009&g=d&q=q&y=0&z=IBM&x=.csv'
Content type 'text/csv' length unknown
opened URL
.......... .......... .......... .......... ..........
.......... .......... .......... .......... ..........
.......... .......... .......... .......... ..........
.......... .......... .......... .......... ..........
.......... .......... .......... ......
downloaded 236 Kb
R>
You could hit that same URL directly as shown in the code you could study. If you need cookies you may need to look at Duncan TL's code to hit Google Docs etc.
Upvotes: 2