DevBW
DevBW

Reputation: 117

Automating HTTP Requests

I work with a team whose only way to get a user in their company's database, is to navigate through and fill out ~5 or so pages of web forms in their browser. Truly brutal stuff. I've developed web automation scripts in VBScript, Java (w/ Selenium WebDriver) and iMacro, but all of these solutions are slow. They also depend on the browser, which I'm trying to move away from.

I'm looking for a new platform, possibly some scripting technique/language that will allow me to issue HTTP requests and read HTTP responses, then build my script around there. The script would perform calculations on the HTTP responses, use File I/O and use this data to issue further HTTP requests. Again, I'm just spitballing here. If anyone else has a better solution, I'm all ears!

My question for you is: Accepting the team's limitations (read-only DB access), how would you approach a solution and what tools/languages/platforms would you use to do so?

Broad and ambiguous answers are welcome. Thank you for your time.

Upvotes: 0

Views: 5651

Answers (2)

Grisk
Grisk

Reputation: 328

I would start looking into NodeJS as a platform. The HTTP library is an incredibly powerful method for writing applications that need to make multiple http requests with unusual structure and it can communicate easily with a browser or basically anything else you could possibly need. Look at using the FileSystem class if you need to do file I/O.

If you wanted to get really fancy and use websockets to build a dynamic webapp that you can use as a front-end for your tool, you could even do that, so there's a lot of flexibility.

Upvotes: 1

theideasmith
theideasmith

Reputation: 2925

I agree with @Grisk on using NodeJS/ioJS as a platform. It is a powerful tool designed from the ground up for I/O, making it perfect for solving your problem. Additionally, the node community is extraordinarily vibrant, with npm, the nodejs package manager, hosting thousands of easily accessible modules. To avoid any future confusion: don't mistake NodeJS for a language or a backend framework; it is a native javascript interpreter built atop Google's V8 engine as well as a set of built in modules to build powerful I/O applications. Read up about node online.

As for your specific problem, I'd say you have two options:

  1. To feign being a browser using phantom cookies
  2. By programmatically navigating through the website as you have been doing.

As for the former option, you'd need to manually determine which cookies are sent to the server when forms are submitted on each page and then in your script generate these cookies and include them in the http request. Check out the nodejs http documentation for more information on customizing the headers of requests.

You're header will need to look something like this:

var headers = {
    'host': < website host address here > ,
    'origin' : <website origin here>
    'referer' : <website origin here>
    'User-Agent': 'Opera/9.52 (X11; Linux i686; U; en)',
    'Cookie': <cookie sent over by server here>
}

I recently came across the node-icloud library, which uses the first method I describe above to provide programmatic access to one's icloud account. I strongly suggest reading through its code to see how it works here.

Additionally, I'd suggest reading up about http headers here

For the second option, check out phantomjs and zombiejs. Phantom is nice because it works without a browser. I'm not sure how the speed of these two libraries compare to what you have already been doing, but they are worth testing out.

One last thing: I would recommend building a custom (JSON)DSL for automating interaction with webpages so you can very easily redesign your browser interaction workflows.

Additionally, if you choose to use nodejs, an understanding of node streams and the details behind its event loop would be beneficial.

Best of luck!

Upvotes: 2

Related Questions