Reputation: 263
<li class="sre" data-tn-component="asdf-search-result" id="85e08291696a3726" itemscope="" itemtype="http://schema.org/puppies">
<div class="sre-entry">
<div class="sre-side-bar">
</div>
<div class="sre-content">
<div class="clickable_asdf_card" onclick="window.open('/r/85e08291696a3726?sp=0', '_blank')" style="cursor: pointer;" target="_blank">
I need to grab the string '/r/85e08291696a3726?sp=0' which occurs throughout a page. I'm not sure how to use the soup.find_all method to do this. The strings that I need always occur next to '
This is what I was thinking (below) but obviously I am getting the parameters wrong. How would I format the find_all method to return the '/r/85e08291696a3726?sp=0' strings throughout the page?
for divsec in soup.find_all('div', class_='clickable_asdf_card'):
print('got links')
x=x+1
I read the documentation for bs4 and I was thinking about using find_all('clickable_asdf_card') to find all occurrences of the string I need but then what? Is there a way to adjust the parameters to return the string I need?
Upvotes: 3
Views: 352
Reputation: 473873
Use BeautifulSoup
's built-in regular expression search to find and extract the desired substring from an onclick
attribute value:
import re
pattern = re.compile(r"window\.open\('(.*?)', '_blank'\)")
for item in soup.find_all(onclick=pattern):
print(pattern.search(item["onclick"]).group(1))
If there is just a single element you want to find, use find()
instead of find_all()
.
Upvotes: 2