jon
jon

Reputation: 359

import.io web crawler with drop downs menus

I'll start off by saying that I am fairly new to this so I apologize if there is a simple or obvious answer.

I have import.io installed and it works fine, but I'm running into a problem. The website I'm trying to scrap is http://hockeyanalysis.com/stats/index.php and as you can see there are several drop down menus. The two I am interested in are the two teams stats, season and situation.

I want to scrap data from the first 5 years and all 36 situation each year. Yes I know this is only 180 different possibilities and I could do them by hand but I'm using this as a learning opportunity.

This is an example of one of the urls. http://hockeyanalysis.com/stats/teamstats.php?db=201415&sit=5v5&disp=1

I know that the db=201415 can be changed to 201314 and so on for each year, and I also know that sit=5v5 can be 5v5home, 5v5road, 5v5close and so on. Those do not follow what I consider logical paths but I could simply copy and paste those. What I would like to do, for example, is to have db=201415 and sit=5v5, 5v5home, 5v5road, then change db=201314, 201213 and have import.io fill in the sit for the others provided. Meaning, I would train it with 5 examples and it could will in the remaining 4.

Is this possible? is there an alternative way to go about this? I appreciate an feedback.

Upvotes: 2

Views: 318

Answers (2)

Jigno Alfred Venezuela
Jigno Alfred Venezuela

Reputation: 147

Did you try using an Extractor or a Crawler? Because crawlers should be able to handle this.

Just use db={num} and sit={alpha} as part of the URL in the Where to extract data from? part of the Advance Crawler Settings.

Something like this:

hockeyanalysis.com/stats/teamstats.php?db={num}&sit={alpha}&disp=1$

This would tell your crawler to just get data from a URL matching the template above.

Upvotes: 0

Wilson Hsieh
Wilson Hsieh

Reputation: 323

In this example, import.io would be able to extract that data for you, but it would not be able to generate the URLs for you.

You will need to use an Extractor with the Bulk Extract feature. Here is a link to the Knowledge base about this subject: http://support.import.io/knowledgebase/articles/569499-extractor

The URLs can be easily generated in Excel or Google Sheets.

I created an example for you: https://docs.google.com/spreadsheets/d/17oZHwGhMHv7tYQJqaOI2FkJH2OePvyERipPtB8-GGlw/edit#gid=0

Upvotes: 2

Related Questions