Reputation: 57
Anyone know how to extract data from a webpage using Import.io where the data is loaded into the page via Ajax?
I am unable to extract data from below mentioned pages.
There is no issue in first page data extraction, but how do I move on to extract data from second page?
URL is given below.
<http://www.amazon.com/gp/aag/main?ie=UTF8&asin=&isAmazonFulfilled=&isCBA=&marketplaceID=ATVPDKIKX0DER&orderID=&seller=A13JB7253Q5S1B>
Upvotes: 3
Views: 1530
Reputation: 165
The data on that page is deployed using an interesting mix of technologies; it relies heavily on server side code and Javascript. That type of page can be a challenge, however, there are always methods to get the data. For example, some sellers have a page like this:
Which is very easy to extract data from, even using the magic algorithm - https://magic.import.io/?site=http:%2F%2Fwww.amazon.co.uk%2Fgp%2Fnode%2Findex.html%3Fie%3DUTF8%26marketplaceID%3DA1F83G8C2ARO7P%26me%3DA2WO1PQ2OIOIGM%26merchant%3DA2WO1PQ2OIOIGM
I had to take off the redirect=true from the URLs before it would work - just an FYI.
Other times some stores don't have such a URL, its a bit of a pain, and there URLs can be tough to figure out.
We do help some of our enterprise customers build bespoke APIs when the data is very important to them, so do feel free to get in touch. I imagine a larger scale workaround would be to create a dataset/API based on a the categories you are interested in and then to filter that larger dataset down (python or CSV style) by seller name. That would probably work!
Upvotes: 2
Reputation: 165
I managed to get a static dataset but no API. You can find that dataset at the following GUID: c7c63f1c-7081-4d4a-ad91-afe9789a6620
Thanks
Upvotes: 2