Reputation: 11
I'm doing a little project for my class and I'm just a beginner, so please forgive me if I mix up some of my terminology.
Basically, I'm creating an interactive journey planner for my city's public transit system. Unfortunately, they haven't made all the data I need publicly available. So instead of putting all my time into gathering the data for personal use, I've opted to do some screen scraping - letting their servers calculate the journey info from a START and STOP variable and then displaying the selected info on my page.
So is it possible to fill out a form's fields remotely, and then scrape the data on the page that subsequently loads? And if so, what would be the quickest, most convenient way? This happens to be a case where the data can't be manipulated via the URL, so it has to access the data by filling out the form first.
The website in question: http://jp.translink.com.au/travel-information/journey-planner
Upvotes: 1
Views: 227
Reputation: 1956
Here is what you can do:
1.) Send a POST Request to the journey-planner
with some data like that (be aware that CORS might jump in, then you could use cURL via PHP or whatsoever):
Start:Wickham Tce, Spring Hill
End:Upper Edward St, Spring Hill
SearchDate:10/05/2013 12:00:00 AM
TimeSearchMode:LeaveAfter
SearchHour:7
SearchMinute:40
TimeMeridiem:AM
TransportModes:Bus
TransportModes:Train
TransportModes:Ferry
MaximumWalkingDistance:1500
WalkingSpeed:Normal
ServiceTypes:Regular
ServiceTypes:Express
ServiceTypes:NightLink
FareTypes:Standard
FareTypes:Prepaid
FareTypes:Free
2.) You will get a new response location. This seems to be a REST link. Important for you is the id
at the end. You will have to call that page and parse the HTML and look for a div
with the HTML-id option-summaries
, where you will find more information within the div
s travel-option-1
to travel-option-n
. You have to look at it carefully in order to find out which information is stored whee and how you will be able to use it.
In order to find such things you should learn how to use Firebug or Chrome's development tools.
This is one way to solve your problem. Probably not the best but still better than "screen-scraping" anything. But it will ask you for a lot of skills and effort. Furthermore if the data provider is going to change just a bit your solution will not work anymore. Additionally they might prevent your access by CORS or anything else (blocking your IP etc.)
Upvotes: 1