Balla P. Tall
Balla P. Tall

Reputation: 1

Can't scrape email from GitHub sidebar (vcard) with BeautifulSoup

I'm trying to scrape emails from GitHub profiles. I can get emails from the main section, but I'm unable to scrape the email from the sidebar (vcard) using BeautifulSoup. I can get emails from the main part of the profile, but the sidebar ones aren't working. Any help is appreciated!

Hmy code:

def extract_email_from_profile(profile_url):
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
    }
    
    response = requests.get(profile_url, headers=headers)
    
    if response.status_code != 200:
        print(f"⚠️ Impossible de récupérer {profile_url} (Code {response.status_code})")
        return None

    soup = BeautifulSoup(response.content, "html.parser")

    # I tried to look into the sidebar using the class name which is Link--primary...
    email_tag = soup.find("a", class_="Link--primary wb-break-all", href=lambda href: href and "mailto:" in href)

    if email_tag:
        return email_tag["href"].replace("mailto:", "").strip()

Upvotes: -2

Views: 41

Answers (0)

Related Questions