Reputation: 1088

Crawling data using import IO

How can I use crawler within a crawler in importIO?

For example there is a list of company details (paginated), and each company has a list of reviews (it is also paginated).

I need to crawl a company's details along with each company's "all" reviews. How can I achieve this? Do I need two tables (company and reviews)? Also how can I use importIO for this?

Upvotes: 0

Answers (1)

Nosmig

Reputation: 165

Without knowing the specific site its hard to comment. For example the way the pagination is implemented on the site will effect the way you get the data. The URL structure will also play an important part too.

If you can see all the data you want in the HTML (view source/inspect element) of the pages, the chances are you can have that Data as an API/CSV.

So you need to either:

share the URLs (plus schema for bonus points) here
email [email protected] with specifics

To answer the more general question: "How can I use crawler within a crawler in importIO?".

Short answer = Yes, but not via the regular UI, you need to do some coding.

Long answer = Yes! You can create what we call a 'chained API' that takes the URLs from one crawled extraction, and feeds those into second extractor which gets the rest of the info. then you just record match that in your Post Extraction Data QA process.

That is unless, all the data you want is embedded on a single URL and you need to get it all. in which case you are looking at making a connector with single row training and a lot of Xpath, but it should work!

For more information, you might want to check out the knowledge base articles at the link below: http://support.import.io/knowledgebase/topics/51287-tutorials

Thanks!

Upvotes: 2

Crawling data using import IO

Answers (1)

Related Questions