g19fanatic
g19fanatic

Reputation: 10921

How to run jquery commands on HTML in python for DOM actions/scraping?

Lets say I'm using urllib2 and cookiejar (like so) to get responses from websites. Now I'm looking for an easy way to use jQuery to essentially scrape data from the response returned from the webserver.

I understand that there are other modules that can be used in python for web-scraping (like), but is it possibly with just jQuery commands? I'm assuming I'd need some sort of js parser within python?

The reason that I am wanting to use jQuery is that I have ~20 Greasemonkey scripts(mostly written by others) that do some interesting modifications to numerous web sites and web games. They do all of the DOM modifications with jQuery. Instead of completely refactoring most of this working and dependable code, I'd like to be able to simply port it to python (enabling simple and effective automation).

Upvotes: 3

Views: 2929

Answers (2)

Lukas Graf
Lukas Graf

Reputation: 32570

pyquery is suited perfectly for this task.

It allows you to use jQuery like selectors on (X)HTML/XML from Python.

For example:

>>> from pyquery import PyQuery as pq
>>> d = pq("<html><p id="hello">Foo</p></html>")

>>> d("#hello")
[<p#hello.hello>]

>>> d('p:first')
[<p#hello.hello>]

See the complete API documentation for details, and the project page on bitbucket for the source and issue tracker.

Upvotes: 7

Martijn Pieters
Martijn Pieters

Reputation: 1121416

Use lxml to parse the HTML and use it's cssselect module:

from lxml.cssselect import CSSSelector
from lxml import etree

tree = etree.parse(document)
elements = CSSSelector('div.content')(tree)

Upvotes: 2

Related Questions