Nimit Pattanasri
Nimit Pattanasri

Reputation: 1602

Crawl data on the app store

Does anyone know how AppShopper.com crawl the data on the Apple's app store? Do we have to simulate a browser using automated testing like Watir? Is this the only way to collect the data (e.g., download statistics, price)?

Upvotes: 2

Views: 7499

Answers (3)

Saqib Saud
Saqib Saud

Reputation: 2795

Crawling is not the best method. There is a partner feed program, which is absolutely free to join. can give you required data. Read the FAQ

Upvotes: 9

Mario Alemi
Mario Alemi

Reputation: 1797

There are hundreds of services like the one you mention... but building your own scraper is not difficult..

Let’s say you want to see all reviews in the UK for the application with id=xxxxxxxxx (right click on iTunes on the application link and select "Copy Link"). You should retrieve the file:

http://itunes.apple.com/WebObjects/MZStore.woa/wa/customerReviews?s=143444&id=xxxxxxxxx&displayable-kind=11

If you put this URL in your browser, you won’t be able to see the same amount of information you would see with iTunes. It might also be that you cannot see anything at all, and your browser will ask to open iTunes. Still, the URL above is the same visited by iTunes –only iTunes asks for it in a slightly different way a web browser would do. To do this, you can use cURL, a command you have by default on most GNU/Linux distributions, but you can also install on Windows.

  1. If you are on Windows, and do not have cURL installed, download it (http://curl.haxx.se/download.html), unzip it, and add the bin directory to the PATH variable (http://www.computerhope.com/issues/ch000549.htm);

  2. Open a terminal window (META+R, digit CMD);

Once you have cURL installed, both on Windows and *nix, cut and paste in your terminal:

curl -H 'Host: itunes.apple.com' -H 'Accept-Language: en-us, en;q=0.50' -H 'X-Apple-Store-Front: 143444,5' -H 'X-Apple-Tz: 3600' -U 'iTunes/9.2.1 (Macintosh; Intel Mac OS X 10.5.8) AppleWebKit/533.16' 'http://itunes.apple.com/WebObjects/MZStore.woa/wa/customerReviews?s=143444&id=xxxxxxxxx&displayable-kind=11'

You should see now the actual XML file seen by iTunes, with all reviews.

Upvotes: 7

hotpaw2
hotpaw2

Reputation: 70663

If you right click on any link or icon in iTunes, it will give you the URL it uses to download the data it displays for that next iTunes page. The format of the HTML data changes in undocumented ways periodically. If you use wget or curl to download data from these URLs, you may also have to imitate the iTunes user agent and national store front name, which you can get by monitoring the iTunes traffic with something like wire shark.

Upvotes: 1

Related Questions