user3891081
user3891081

Reputation: 93

How to scrape text between br tags

I am trying to scrape a simple website http://hosted.where2getit.com/sharpsiica/index.html?form=locator_search&sku=ARM355&addressline=53203&zip=53203

I have tried the following code to scrape name and address:

import lxml.html as lh    
from selenium import webdriver    
import time

browser = webdriver.Firefox()

browser.get('http://hosted.where2getit.com/sharpsiica/index.html?form=locator_search&sku=ARM355&addressline=53203&zip=53203')

time.sleep(5)

content = browser.page_source

tree = lh.fromstring(content)

name=tree.xpath('//table[@id="collection_poi"]/tbody/tr/td[@align="left"]/a/text()')

address=tree.xpath('//table[@id="collection_poi"]/tbody/tr/td[@align="left"]/text()')

print(name,address)

I am getting names properly but for address I am getting too much unwanted data. I need the name and address only.

Where I am doing wrong?

Upvotes: 0

Views: 1025

Answers (1)

Md. Mohsin
Md. Mohsin

Reputation: 1832

Strip it -

address=[c.strip() for c in address]

Hope that helps.

But I am just wondering, Why would you extract an entire list of addresses and Names? Wouldn't you want to do something like,

import lxml.html as lh
from selenium import webdriver
import time

browser = webdriver.Firefox()
browser.get('http://hosted.where2getit.com/sharpsiica/index.html?form=locator_search&sku=ARM355&addressline=53203&zip=53203')
time.sleep(5)
content = browser.page_source
tree = lh.fromstring(content)

for tr in tree.xpath('//*[@id="collection_poi"]//tr'):
    name=tr.xpath('.//*[@class="store_name"]//text()')
    name=[c.strip() for c in name]
    address=tr.xpath('.//*[@align="left"]//text()')
    address=[c.strip() for c in address]
    print(name,address)

You may even want to remove empty elements from the obtained list,

address=filter(None, address)
print address

Hope that helps :-)

Upvotes: 3

Related Questions