Fairlight Evony
Fairlight Evony

Reputation: 243

Python to get onclick values

I'm using Python and BeautifulSoup to scrape a web page for a small project of mine. The webpage has multiple entries, each separated by a table row in HTML. My code partially works however a lot of the output is blank and it won't fetch all of the results from the web page or even gather them into the same line.

<html>
<head>
<title>Sample Website</title>
</head>
<body>

<table>
<td class=channel>Artist</td><td class=channel>Title</td><td class=channel>Date</td><td class=channel>Time</td></tr>
<tr><td>35</td><td>Lorem Ipsum</td><td><a href="#" onClick="searchDB('LoremIpsum','FooWorld')">FooWorld</a></td><td>12/10/2014</td><td>2:53:17 PM</td></tr>
</table>
</body>
</html>

I want to only extract the values from the onclick action 'searchDB', so for example 'LoremIpsum' and 'FooWorld' are the only two results that I want.

Here is the code that I've written. So far, it properly pulls some of the write values, but sometimes the values are empty.

response = urllib2.urlopen(url)

html = response.read()

soup = bs4.BeautifulSoup(html)

properties = soup.findAll('a', onclick=True)

for eachproperty in properties:
    print re.findall("'([a-zA-Z0-9]*)'", eachproperty['onclick'])

What am I doing wrong?

Upvotes: 5

Views: 17289

Answers (2)

Steferson Ferreira
Steferson Ferreira

Reputation: 361

Try this

or row in rows[1:]: cols = row.findAll('td') link = cols[1].find('a').get('onclick')

Upvotes: 3

Hackaholic
Hackaholic

Reputation: 19763

try like this:

>>> import re
>>> for x in soup.find_all('a'):    # will give you all a tag
...     try:
...         if re.match('searchDB',x['onclick']):    # if onclick attribute exist, it will match for searchDB, if success will print
...             print x['onclick']        # here you can do your stuff instead of print
...     except:pass
... 
searchDB('LoremIpsum','FooWorld')

instead of print you can save it to some variable like

>>> k = x['onclick']
>>> re.findall("'(\w+)'",k)
['LoremIpsum', 'FooWorld']

\w is equivalent to [a-zA-Z0-9]

Upvotes: 8

Related Questions