Natanael
Natanael

Reputation: 2420

Chrome/Firefox web browser automation for collect data

I would like to browse automatically in a website to collect some data.

There's a page with a form. The form consists of a select and a submit button. Selecting an option of the select and clicking on the submit button leads to another page where there's some tables with related data.

I need to collect and save in file this data for each option. Probably I will need to go back to the first page to repeat the task for each option. The detail is that I don't know the exactly number of options previously.

My idea is to do that task, preferably, with Firefox or Chrome. I think that the only way to do that is via programming.

Someone could indicate me a way to do that task in a easy and fast way. I know a little bit about Java, Javascript and Python.

Upvotes: 0

Views: 1317

Answers (3)

Natanael
Natanael

Reputation: 2420

I found a solution to my problem. It's called HtmlUnit:

http://htmlunit.sourceforge.net/gettingStarted.html

HtmlUnit is a "GUI-Less browser for Java programs".

It allows to web browsing and data collecting using Java and it's very simple and easy to use.

Not exactly what I asked, but it's better. At least to me.

Upvotes: 1

Alex Weinstein
Alex Weinstein

Reputation: 9891

Since the task is relatively well constrained, I would avoid Selenium (it's a little brittle), and instead try this approach:

  • Get a comprehensive list of options from the first page, record that in a text file
  • Capture, using a network monitoring tool like Fiddler, the traffic that is sent when you submit the first page. See what exactly is submitted to the server - and how (POST vs GET, parameter encoding, etc).
  • Use a tool like curl to replay the request steps in the exact format that you captured in step 2. Then write a batch script (using bash or python) to run through all the values in the text file from step 1 to do curl for all the values in the dropdown. Save curl output to files.

Upvotes: 1

gerrytan
gerrytan

Reputation: 41143

You might want to google "web browser automation" tool like Selenium. Although not entirely fit for the purpose I think it can be used to implement your requirement.

Upvotes: 2

Related Questions