jjyoh
jjyoh

Reputation: 426

extract email address using beautifulsoup (TypeError: 'int' object is not subscriptable)

I have a quick issue with this part of my code. Basically I'm using beautifulsoup to scrap a website. I need to extract only the email address from a href tag which is inside a div with a class (see below):

<div class="startup-email-link social-links-startup">
    <a href="mailto:[email protected]">d</a>
</div>

And my code gives me this error: TypeError: 'int' object is not subscriptable

import requests
from bs4 import BeautifulSoup
import re

source_code = requests.get(item_url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text, "html.parser")

for link in soup.find('div', {'class': 'startup-email-link'}):
    href = link.find('a')['href']
    print(href)


    #href_final = re.compile('mailto')
    #print(href_final)

Upvotes: 1

Views: 4080

Answers (2)

SIM
SIM

Reputation: 22440

If parsing email is your only target, you can do that with few lines of code. Try the below one. Just fill in the item_url field with that website link.

import requests
from bs4 import BeautifulSoup

item_url = "put your url here"
soup = BeautifulSoup(requests.get(item_url).text, "lxml")
for email in soup.select(".startup-email-link a[href^='mailto:']"):
    print(email['href'])

Upvotes: 0

fodma1
fodma1

Reputation: 3535

soup.find already returns a single tag, so no need to iterate on it. You can just get the link as

soup.find('div', {'class': 'startup-email-link'}).find('a')['href']

You may want to make it more robust in case the div with the class or the anchor tag is missing:

div = soup.find('div', {'class': 'startup-email-link'})
if div is None:
    return None
anchor = div.find('a')
if anchor is None:
    return None
return anchor['href']

Or you can use css selector if you prefer to keep it more concise:

selection = soup.select('div.startup-email-linak > a')
if not selection:
    return None
return selection[0]['href']

Upvotes: 1

Related Questions