Reputation: 85
Need help scrubbing a link to an image that is stored in the onclick=
value.
I do this, but I stopped how to remove everything in onclick except for the link.
<a onclick="ShowEnlargedImagePreview( 'https://steamuserimages-a.akamaihd.net/ugc/794261971268711656/69C39CF2A2BBCDDC7C04C17DF1E88A6ED875DBE7/' );"></a>
links = soup.find('div', class_='workshopItemPreviewImageMain')
links = links.findChild('a', attrs={'onclick': re.compile("^https://")})
But nothing is output.
links = soup.find('div', class_='workshopItemPreviewImageMain')
links = links.findChild('a')
links = links.get("onclick")
The entire value of onclick is displayed:
howEnlargedImagePreview( 'https://steamuserimages-a.akamaihd.net/ugc/794261971268711656/69C39CF2A2BBCDDC7C04C17DF1E88A6ED875DBE7/' )
But only a link is needed.
Upvotes: 2
Views: 64
Reputation:
You just need to change your regular expression.
from bs4 import BeautifulSoup
import re
pattern = re.compile(r'''(?P<quote>['"])(?P<href>https?://.+?)(?P=quote)''')
data = '''
<div class="workshopItemPreviewImageMain">
<a onclick="ShowEnlargedImagePreview( 'https://steamuserimages-a.akamaihd.net/ugc/794261971268711656/69C39CF2A2BBCDDC7C04C17DF1E88A6ED875DBE7/' );"></a>
</div>
'''
soup = BeautifulSoup(data, 'html.parser')
div = soup.find('div', class_='workshopItemPreviewImageMain')
links = div.find_all('a', {'onclick': pattern})
for a in links:
print(pattern.search(a['onclick']).group('href'))
Upvotes: 2