Christopher DuBois
Christopher DuBois

Reputation: 43440

Using wget to pull csv from Google Trends

I would like to download Google Trends csv data using wget, but I'm unfamiliar with using wget. An example URL is:

http://www.google.com/insights/search/overviewReport?cat=71&geo=US&q=apple&date&cmpt=q&content=1&export=1

Opening this with a web browser, I retrieve the expected file. To do this with wget, I tried the following command:

wget "http://www.google.com/insights/search/overviewReport?cat=71&geo=US&q=apple&date&cmpt=q&content=1&export=1" -O report.csv

which results in the following:

<html><head><title>Redirecting</title>
<meta http-equiv="refresh" content="0; url=&#39;http://www.google.com/insights/search#content=1&amp;cat=71&amp;geo=US&amp;q=apple&amp;date&amp;cmpt=q&#39;"></head>
<body bgcolor="#ffffff" text="#000000" link="#0000cc" vlink="#551a8b" alink="#ff0000"><script type="text/javascript" language="javascript">
    location.replace("http://www.google.com/insights/search#content\x3d1\x26cat\x3d71\x26geo\x3dUS\x26q\x3dapple\x26date\x26cmpt\x3dq")
  </script></body></html>

My first guess is that wget doesn't have access to cookies with proper authentication.

Anybody?

Upvotes: 0

Views: 3327

Answers (1)

Dirk is no longer here
Dirk is no longer here

Reputation: 368251

You are getting a redirect message. The URL in the location.replace bit and you get a valid index.html from Google is you that URL in a second call to wget.

Methinks you simply don't have the proper URL from where the csv data is downloaded. For a working example of how to 'hit' a CGI interface with a downloader, look at R's get.hist.quote() in the tseries package.

Edit: Here is what get.hist.quote() does:

R> IBM <- get.hist.quote("IBM")
trying URL 'http://chart.yahoo.com/table.csv?s=IBM&a=0&b=02&c=1991&d=9&e=08&f=2009&g=d&q=q&y=0&z=IBM&x=.csv'
Content type 'text/csv' length unknown
opened URL
.......... .......... .......... .......... ..........
.......... .......... .......... .......... ..........
.......... .......... .......... .......... ..........
.......... .......... .......... .......... ..........
.......... .......... .......... ......
downloaded 236 Kb

R>

You could hit that same URL directly as shown in the code you could study. If you need cookies you may need to look at Duncan TL's code to hit Google Docs etc.

Upvotes: 2

Related Questions