Reputation: 426
I have a quick issue with this part of my code. Basically I'm using beautifulsoup to scrap a website. I need to extract only the email address from a href tag which is inside a div with a class (see below):
<div class="startup-email-link social-links-startup">
<a href="mailto:[email protected]">d</a>
</div>
And my code gives me this error: TypeError: 'int' object is not subscriptable
import requests
from bs4 import BeautifulSoup
import re
source_code = requests.get(item_url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text, "html.parser")
for link in soup.find('div', {'class': 'startup-email-link'}):
href = link.find('a')['href']
print(href)
#href_final = re.compile('mailto')
#print(href_final)
Upvotes: 1
Views: 4080
Reputation: 22440
If parsing email is your only target, you can do that with few lines of code. Try the below one. Just fill in the item_url
field with that website link.
import requests
from bs4 import BeautifulSoup
item_url = "put your url here"
soup = BeautifulSoup(requests.get(item_url).text, "lxml")
for email in soup.select(".startup-email-link a[href^='mailto:']"):
print(email['href'])
Upvotes: 0
Reputation: 3535
soup.find
already returns a single tag, so no need to iterate on it.
You can just get the link as
soup.find('div', {'class': 'startup-email-link'}).find('a')['href']
You may want to make it more robust in case the div with the class or the anchor tag is missing:
div = soup.find('div', {'class': 'startup-email-link'})
if div is None:
return None
anchor = div.find('a')
if anchor is None:
return None
return anchor['href']
Or you can use css selector if you prefer to keep it more concise:
selection = soup.select('div.startup-email-linak > a')
if not selection:
return None
return selection[0]['href']
Upvotes: 1