Reputation: 99
Here, I want to scrape a website called "fundsnetservices.com." Specifically, I want to grab the text below each program — it's about a paragraph's worth of text.
Using the Google Chrome Inspect method, I was able to pull this...
'/html/body/div[3]/div/div/div[1]/div/p[2]/text()'
... as the xpath. However, every time I print the text out, it returns [ ]. Why might this be?
response = urllib.request.urlopen('http://www.fundsnetservices.com/searchresult/30/International-Grants-&-Funders/18.html')
tree = etree.HTML(response.read().decode('utf-16'))
text = tree.xpath('/html/body/div[3]/div/div/div[1]/div/p[2]/text()')
Upvotes: 0
Views: 135
Reputation: 5905
It seems your code returns whitespace nodes. Correct your XPath with :
//p[@class="tdclass"]/text()[3]
Upvotes: 1