Reputation: 25
I'm trying to extract the email address text but struggling. How can I get the email?
My code:
#importing libraries
import pandas as pd
import requests
from bs4 import BeautifulSoup
#send request
url = 'https://apps.shopify.com/loox?surface_detail=all&surface_inter_position=1&surface_intra_position=1&surface_type=category'
page = requests.get(url)
#parse the html
soup = BeautifulSoup(page.text, 'html.parser')
#get email
app_email = []
email_elm = soup.find_all(class_= 'app-support-list__item')
I'm trying to scrape "[email protected]" email from here: https://apps.shopify.com/loox and here is the source from the page:
</div>
<div class="grid__item grid__item--tablet-up-third grid__item--desktop-up-3 grid__item--desktop-up-push-3">
<div class="block app-listing__support-section app-listing__section">
<h3 class="block__heading heading--2">Support</h3>
<div class="block__content">
<ul class="app-support-list">
<li class="app-support-list__item">
<span><svg class="icon" aria-hidden="true" focusable="false"> <use xlink:href="#v2-icons-icon-faq" /> </svg><a class="ui-external-link" href="http://help.loox.io" rel="nofollow noopener" target="_blank" aria-describedby="aria-new-window-desc">FAQ<svg class="icon" aria-hidden="true" focusable="false"> <use xlink:href="#polaris-external" /> </svg></a></span>
</li>
<li class="app-support-list__item">
<span><svg class="icon" aria-hidden="true" focusable="false"> <use xlink:href="#v2-icons-icon-website-url" /> </svg><a class="ui-external-link" href="https://loox.app" rel="nofollow noopener" target="_blank" aria-describedby="aria-new-window-desc">Developer website<svg class="icon" aria-hidden="true" focusable="false"> <use xlink:href="#polaris-external" /> </svg></a></span>
</li>
<li class="app-support-list__item">
<span><svg class="icon" aria-hidden="true" focusable="false"> <use xlink:href="#v2-icons-icon-privacy" /> </svg><a class="ui-external-link" href="https://loox.io/legal/privacy_policy.pdf" rel="nofollow noopener" target="_blank" aria-describedby="aria-new-window-desc">Privacy policy<svg class="icon" aria-hidden="true" focusable="false"> <use xlink:href="#polaris-external" /> </svg></a></span>
</li>
<li class="app-support-list__item">
<span><svg class="icon" aria-hidden="true" focusable="false"> <use xlink:href="#v2-icons-icon-email" /> </svg>[email protected]</span>
</li>
Upvotes: 0
Views: 1308
Reputation: 20058
You can use slicing and text
to get only text.
Instead of
email_elm = soup.find_all(class_= 'app-support-list__item')
Try:
email_elm = soup.find_all(class_= "app-support-list__item")[3].text.strip()
Upvotes: 0
Reputation: 128
This is how you can target the email in this particular case:
email_elm[3].span.text.replace(" ", "")
This gets the fourth list item, then the text in the span, and then we remove the spaces.
Upvotes: 1