Reputation: 93
I am trying to scrape a simple website http://hosted.where2getit.com/sharpsiica/index.html?form=locator_search&sku=ARM355&addressline=53203&zip=53203
I have tried the following code to scrape name and address:
import lxml.html as lh
from selenium import webdriver
import time
browser = webdriver.Firefox()
browser.get('http://hosted.where2getit.com/sharpsiica/index.html?form=locator_search&sku=ARM355&addressline=53203&zip=53203')
time.sleep(5)
content = browser.page_source
tree = lh.fromstring(content)
name=tree.xpath('//table[@id="collection_poi"]/tbody/tr/td[@align="left"]/a/text()')
address=tree.xpath('//table[@id="collection_poi"]/tbody/tr/td[@align="left"]/text()')
print(name,address)
I am getting names properly but for address I am getting too much unwanted data. I need the name and address only.
Where I am doing wrong?
Upvotes: 0
Views: 1025
Reputation: 1832
Strip it -
address=[c.strip() for c in address]
Hope that helps.
But I am just wondering, Why would you extract an entire list of addresses and Names? Wouldn't you want to do something like,
import lxml.html as lh
from selenium import webdriver
import time
browser = webdriver.Firefox()
browser.get('http://hosted.where2getit.com/sharpsiica/index.html?form=locator_search&sku=ARM355&addressline=53203&zip=53203')
time.sleep(5)
content = browser.page_source
tree = lh.fromstring(content)
for tr in tree.xpath('//*[@id="collection_poi"]//tr'):
name=tr.xpath('.//*[@class="store_name"]//text()')
name=[c.strip() for c in name]
address=tr.xpath('.//*[@align="left"]//text()')
address=[c.strip() for c in address]
print(name,address)
You may even want to remove empty elements from the obtained list,
address=filter(None, address)
print address
Hope that helps :-)
Upvotes: 3