Minghui Yu
Minghui Yu

Reputation: 1363

Use browser to run custom JavaScript on page (client side) to simulate clicking? How to do?

I want to automatically grab some content from a page.

I wonder if it is possible:

  1. Run my own written JavaScript on the page after the page is loaded (I use FireFox. I don't have the ability to change content of the page. I just want to run JS on my browser.). The script will use getelementbyid or similar method to get the link to the next page

  2. Run a JavaScript to collect my interested content (some URLs) on that page and store those URLs in a local file

  3. Go to next page (the next page will get really loaded with my browser, but I do not need to intervene at all) and repeat step 1 and step 2, until there is no next page.

The classic way to do this is to write a Perl script using LWP or PHP script using CURL, etc. But that is all server side. I wonder if I can do it client side.

Upvotes: 4

Views: 6395

Answers (2)

Jeremy J Starcher
Jeremy J Starcher

Reputation: 23863

I do something rather similar, actually.

By using GreaseMonkey, you can write a user-script that will interact with the pages however you need. You can get the next page link and scroll things as you like.

You can also store any data locally, within Firefox though some new functions called GM_getValue and GM_setValue.

I take the lazy way out. I just generate a long list of the URLs that I find when navigating the pages. I do a crude "document.write" method and I dump out my list of URLs as a batch file that rules on wget.

At that point I copy-and-paste the batch file then run it.

If you need to run this often enough that it should be automated, there used to be a way to turn GreaseMonkey scripts into Firefox extensions, that have access to more power.

Another option is currently AFAIK, Chrome only. You can collect whatever information you need and build a large file from it, then use the download attribute of a link and come up with a single-click to save things.

Update

I was going to share the full code for that I was doing, but it was so tied to a particular website that it wouldn't have really helped -- so I'll go for a more "general" solution.

Warning, this code typed on the fly and may not be actually correct.

// Define the container
// If you are crawling multiple pages, you'd want to load this from
// localStorage.
var savedLinks = [];

// Walk through the document and build the links.
for (var i = 0; i < document.links.length; i++) {
  var link = document.links[i];

  var data = { 
    url: link.url,
    desc = getText(link)
  };

  savedLinks.push(data);
}

// Here you'd want to save your data via localStorage.


// If not on the last page, find the 'next' button and load the next page
// [load next page here]

// If we *are* on the last page, use document.write to output our list.
// 
// Note: document.write totally destroys the current document.  It really is quite
// an ugly way to do it, but in this case it works.
document.write(JSON.stringify(savedLinks, null, 2));

Upvotes: 5

Nicholas Albion
Nicholas Albion

Reputation: 3284

Selenium/webdriver will let you write a simple java/ruby/php app that will launch Firefox, use its JavaScript engine to interact with the page in the browse.

Or, if the web page does not require JavaScript to make the content you see interested in available, you could use a html parser in your favourite language and leave the browser out of it.

If you want to do it in JavaScript in Firefox you could probably do it in a greasemonkey script

Upvotes: 2

Related Questions