Python to get onclick values

Question

I'm using Python and BeautifulSoup to scrape a web page for a small project of mine. The webpage has multiple entries, each separated by a table row in HTML. My code partially works however a lot of the output is blank and it won't fetch all of the results from the web page or even gather them into the same line.



Sample Website




Artist Title Date Time
35 Lorem Ipsum FooWorld 12/10/2014 2:53:17 PM

I want to only extract the values from the onclick action 'searchDB', so for example 'LoremIpsum' and 'FooWorld' are the only two results that I want.

Here is the code that I've written. So far, it properly pulls some of the write values, but sometimes the values are empty.

response = urllib2.urlopen(url)

html = response.read()

soup = bs4.BeautifulSoup(html)

properties = soup.findAll('a', onclick=True)

for eachproperty in properties:
    print re.findall("'([a-zA-Z0-9]*)'", eachproperty['onclick'])

What am I doing wrong?

Hackaholic · Accepted Answer

try like this:

>>> import re
>>> for x in soup.find_all('a'):    # will give you all a tag
...     try:
...         if re.match('searchDB',x['onclick']):    # if onclick attribute exist, it will match for searchDB, if success will print
...             print x['onclick']        # here you can do your stuff instead of print
...     except:pass
... 
searchDB('LoremIpsum','FooWorld')

instead of print you can save it to some variable like

>>> k = x['onclick']
>>> re.findall("'(\w+)'",k)
['LoremIpsum', 'FooWorld']

\w is equivalent to [a-zA-Z0-9]

Python to get onclick values

Answers (2)

Related Questions