Zubair Farooq
Zubair Farooq

Reputation: 123

How can i get only name and contact number from div?

I'm trying to get name and contact number from div and div has three span, but the problem is that sometime div has only one span, some time two and sometime three span.

Here is HTML

<div class="ds-body-small" id="yui_3_18_1_1_1554645615890_3864">
 <span class="listing-field" id="yui_3_18_1_1_1554645615890_3863">beth 
 budinich</span>
 <span class="listing-field"><a href="http://Www.redfin.com" 
 target="_blank">See listing website</a></span>
 <span class="listing-field" id="yui_3_18_1_1_1554645615890_4443">(206) 
 793-8336</span>
</div>

Here is my Code

try:
  name= browser.find_element_by_xpath("//span[@class='listing-field'][1]")
  name = name.text.strip()
  print("name : " + name)
except:
  print("Name are missing")
  name = "N/A"

try:
  contact_info= browser.find_element_by_xpath("//span[@class='listing- 
  field'][3]")
  contact_info = contact_info.text.strip()
  print("contact info : " + contact_info)
except:
  print("contact_info are missing")
  days = "N/A" 

My code is not giving me correct result. Can anyone provide me best possible solution. Thanks

Upvotes: 1

Views: 167

Answers (2)

Sers
Sers

Reputation: 12255

You can iterate throw contacts and check, if there's child a element and if match phone number pattern:

contacts = browser.find_elements_by_css_selector("span.listing-field")

contact_name = []
contact_phone = "N/A"
contact_web = "N/A"

for i in range(0, len(contacts)):
    if len(contacts[i].find_elements_by_tag_name("a")) > 0:
        contact_web = contacts[i].find_element_by_tag_name("a").get_attribute("href")
    elif re.search("\\(\\d+\\)\\s+\\d+-\\d+", contacts[i].text):
        contact_phone = contacts[i].text
    else:
        contact_name.append(contacts[i].text)

contact_name = ", ".join(contact_name) if len(contact_name) > 0 else "N/A"

Output:

contact_name: ['Kevin Howard', 'Howard enterprise']
contact_phone: '(206) 334-8414'

The page has captcha. To scrape better to use , all information provided in json format.

Upvotes: 3

sudharsan
sudharsan

Reputation: 46

#sudharsan
# April 07 2019
from bs4 import BeautifulSoup
text ='''<div class="ds-body-small" id="yui_3_18_1_1_1554645615890_3864">
<span class="listing-field" id="yui_3_18_1_1_1554645615890_3863">beth 
budinich</span>
<span class="listing-field"><a href="http://Www.redfin.com" 
target="_blank">See listing website</a></span>
<span class="listing-field" id="yui_3_18_1_1_1554645615890_4443">(206) 
793-8336</span>
</div>'''
# the given sample html is stored as a input in variable called "text"
soup = BeautifulSoup(text,"html.parser")
main = soup.find(class_="listing-field")
# Now the spans with class name "listing-field" is stored as list in "main"
print main[0].text
# it will print the first span element
print main[-1].text
# it will print the last span element
#Thank you
# if you like the code "Vote for it"

Upvotes: 0

Related Questions