Razilator
Razilator

Reputation: 85

How to take link from onclickvalue in BeautifulSoup?

Need help scrubbing a link to an image that is stored in the onclick= value. I do this, but I stopped how to remove everything in onclick except for the link.

<a onclick="ShowEnlargedImagePreview( 'https://steamuserimages-a.akamaihd.net/ugc/794261971268711656/69C39CF2A2BBCDDC7C04C17DF1E88A6ED875DBE7/' );"></a>

links = soup.find('div', class_='workshopItemPreviewImageMain')
links = links.findChild('a', attrs={'onclick': re.compile("^https://")})

But nothing is output.

links = soup.find('div', class_='workshopItemPreviewImageMain')
links = links.findChild('a')
links = links.get("onclick")

The entire value of onclick is displayed:

howEnlargedImagePreview( 'https://steamuserimages-a.akamaihd.net/ugc/794261971268711656/69C39CF2A2BBCDDC7C04C17DF1E88A6ED875DBE7/' )

But only a link is needed.

Upvotes: 2

Views: 64

Answers (1)

user5386938
user5386938

Reputation:

You just need to change your regular expression.

from bs4 import BeautifulSoup
import re

pattern = re.compile(r'''(?P<quote>['"])(?P<href>https?://.+?)(?P=quote)''')

data = '''
<div class="workshopItemPreviewImageMain">
<a onclick="ShowEnlargedImagePreview( 'https://steamuserimages-a.akamaihd.net/ugc/794261971268711656/69C39CF2A2BBCDDC7C04C17DF1E88A6ED875DBE7/' );"></a>
</div>
'''

soup = BeautifulSoup(data, 'html.parser')

div = soup.find('div', class_='workshopItemPreviewImageMain')

links = div.find_all('a', {'onclick': pattern})

for a in links:
    print(pattern.search(a['onclick']).group('href'))

Upvotes: 2

Related Questions