How to scrape text between br tags

Question

I am trying to scrape a simple website http://hosted.where2getit.com/sharpsiica/index.html?form=locator_search&sku=ARM355&addressline=53203&zip=53203

I have tried the following code to scrape name and address:

import lxml.html as lh    
from selenium import webdriver    
import time

browser = webdriver.Firefox()

browser.get('http://hosted.where2getit.com/sharpsiica/index.html?form=locator_search&sku=ARM355&addressline=53203&zip=53203')

time.sleep(5)

content = browser.page_source

tree = lh.fromstring(content)

name=tree.xpath('//table[@id="collection_poi"]/tbody/tr/td[@align="left"]/a/text()')

address=tree.xpath('//table[@id="collection_poi"]/tbody/tr/td[@align="left"]/text()')

print(name,address)

I am getting names properly but for address I am getting too much unwanted data. I need the name and address only.

Where I am doing wrong?

Md. Mohsin · Accepted Answer

Strip it -

address=[c.strip() for c in address]

Hope that helps.

But I am just wondering, Why would you extract an entire list of addresses and Names? Wouldn't you want to do something like,

import lxml.html as lh
from selenium import webdriver
import time

browser = webdriver.Firefox()
browser.get('http://hosted.where2getit.com/sharpsiica/index.html?form=locator_search&sku=ARM355&addressline=53203&zip=53203')
time.sleep(5)
content = browser.page_source
tree = lh.fromstring(content)

for tr in tree.xpath('//*[@id="collection_poi"]//tr'):
    name=tr.xpath('.//*[@class="store_name"]//text()')
    name=[c.strip() for c in name]
    address=tr.xpath('.//*[@align="left"]//text()')
    address=[c.strip() for c in address]
    print(name,address)

You may even want to remove empty elements from the obtained list,

address=filter(None, address)
print address

Hope that helps :-)

How to scrape text between br tags

Answers (1)

Related Questions