Reputation: 123
I'm trying to get name and contact number from div and div has three span, but the problem is that sometime div has only one span, some time two and sometime three span.
First span has name.
Second span has other data.
Third span has contact number
Here is HTML
<div class="ds-body-small" id="yui_3_18_1_1_1554645615890_3864">
<span class="listing-field" id="yui_3_18_1_1_1554645615890_3863">beth
budinich</span>
<span class="listing-field"><a href="http://Www.redfin.com"
target="_blank">See listing website</a></span>
<span class="listing-field" id="yui_3_18_1_1_1554645615890_4443">(206)
793-8336</span>
</div>
Here is my Code
try:
name= browser.find_element_by_xpath("//span[@class='listing-field'][1]")
name = name.text.strip()
print("name : " + name)
except:
print("Name are missing")
name = "N/A"
try:
contact_info= browser.find_element_by_xpath("//span[@class='listing-
field'][3]")
contact_info = contact_info.text.strip()
print("contact info : " + contact_info)
except:
print("contact_info are missing")
days = "N/A"
My code is not giving me correct result. Can anyone provide me best possible solution. Thanks
Upvotes: 1
Views: 167
Reputation: 12255
You can iterate throw contacts and check, if there's child a
element and if match phone number pattern:
contacts = browser.find_elements_by_css_selector("span.listing-field")
contact_name = []
contact_phone = "N/A"
contact_web = "N/A"
for i in range(0, len(contacts)):
if len(contacts[i].find_elements_by_tag_name("a")) > 0:
contact_web = contacts[i].find_element_by_tag_name("a").get_attribute("href")
elif re.search("\\(\\d+\\)\\s+\\d+-\\d+", contacts[i].text):
contact_phone = contacts[i].text
else:
contact_name.append(contacts[i].text)
contact_name = ", ".join(contact_name) if len(contact_name) > 0 else "N/A"
Output:
contact_name: ['Kevin Howard', 'Howard enterprise']
contact_phone: '(206) 334-8414'
The page has captcha. To scrape better to use requests, all information provided in json format.
Upvotes: 3
Reputation: 46
#sudharsan
# April 07 2019
from bs4 import BeautifulSoup
text ='''<div class="ds-body-small" id="yui_3_18_1_1_1554645615890_3864">
<span class="listing-field" id="yui_3_18_1_1_1554645615890_3863">beth
budinich</span>
<span class="listing-field"><a href="http://Www.redfin.com"
target="_blank">See listing website</a></span>
<span class="listing-field" id="yui_3_18_1_1_1554645615890_4443">(206)
793-8336</span>
</div>'''
# the given sample html is stored as a input in variable called "text"
soup = BeautifulSoup(text,"html.parser")
main = soup.find(class_="listing-field")
# Now the spans with class name "listing-field" is stored as list in "main"
print main[0].text
# it will print the first span element
print main[-1].text
# it will print the last span element
#Thank you
# if you like the code "Vote for it"
Upvotes: 0