neohex
neohex

Reputation: 25

How to extract email text using beautifulsoup?

I'm trying to extract the email address text but struggling. How can I get the email?

My code:

#importing libraries

import pandas as pd
import requests
from bs4 import BeautifulSoup

#send request

url = 'https://apps.shopify.com/loox?surface_detail=all&surface_inter_position=1&surface_intra_position=1&surface_type=category'
page = requests.get(url)

#parse the html

soup = BeautifulSoup(page.text, 'html.parser')

#get email

app_email = []

email_elm = soup.find_all(class_= 'app-support-list__item')

I'm trying to scrape "[email protected]" email from here: https://apps.shopify.com/loox and here is the source from the page:

</div>
    <div class="grid__item grid__item--tablet-up-third grid__item--desktop-up-3 grid__item--desktop-up-push-3">
      
<div class="block app-listing__support-section app-listing__section">
  <h3 class="block__heading heading--2">Support</h3>
  <div class="block__content">
    <ul class="app-support-list">
        <li class="app-support-list__item">
          <span><svg class="icon" aria-hidden="true" focusable="false"> <use xlink:href="#v2-icons-icon-faq" /> </svg><a class="ui-external-link" href="http://help.loox.io" rel="nofollow noopener" target="_blank" aria-describedby="aria-new-window-desc">FAQ<svg class="icon" aria-hidden="true" focusable="false"> <use xlink:href="#polaris-external" /> </svg></a></span>
        </li>
        <li class="app-support-list__item">
          <span><svg class="icon" aria-hidden="true" focusable="false"> <use xlink:href="#v2-icons-icon-website-url" /> </svg><a class="ui-external-link" href="https://loox.app" rel="nofollow noopener" target="_blank" aria-describedby="aria-new-window-desc">Developer website<svg class="icon" aria-hidden="true" focusable="false"> <use xlink:href="#polaris-external" /> </svg></a></span>
        </li>
        <li class="app-support-list__item">
          <span><svg class="icon" aria-hidden="true" focusable="false"> <use xlink:href="#v2-icons-icon-privacy" /> </svg><a class="ui-external-link" href="https://loox.io/legal/privacy_policy.pdf" rel="nofollow noopener" target="_blank" aria-describedby="aria-new-window-desc">Privacy policy<svg class="icon" aria-hidden="true" focusable="false"> <use xlink:href="#polaris-external" /> </svg></a></span>
        </li>
        <li class="app-support-list__item">
          <span><svg class="icon" aria-hidden="true" focusable="false"> <use xlink:href="#v2-icons-icon-email" /> </svg>[email protected]</span>
        </li>

Upvotes: 0

Views: 1308

Answers (2)

MendelG
MendelG

Reputation: 20058

You can use slicing and text to get only text.

Instead of

email_elm = soup.find_all(class_= 'app-support-list__item')

Try:

email_elm = soup.find_all(class_= "app-support-list__item")[3].text.strip()

Upvotes: 0

mkk
mkk

Reputation: 128

This is how you can target the email in this particular case:

email_elm[3].span.text.replace(" ", "")

This gets the fourth list item, then the text in the span, and then we remove the spaces.

Upvotes: 1

Related Questions