Reputation: 1
I'm trying to scrape emails from GitHub profiles. I can get emails from the main section, but I'm unable to scrape the email from the sidebar (vcard) using BeautifulSoup. I can get emails from the main part of the profile, but the sidebar ones aren't working. Any help is appreciated!
Hmy code:
def extract_email_from_profile(profile_url):
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
}
response = requests.get(profile_url, headers=headers)
if response.status_code != 200:
print(f"⚠️ Impossible de récupérer {profile_url} (Code {response.status_code})")
return None
soup = BeautifulSoup(response.content, "html.parser")
# I tried to look into the sidebar using the class name which is Link--primary...
email_tag = soup.find("a", class_="Link--primary wb-break-all", href=lambda href: href and "mailto:" in href)
if email_tag:
return email_tag["href"].replace("mailto:", "").strip()
Upvotes: -2
Views: 41