Reputation: 15

HTML Scraping XPath

I'm trying to scrape some data from a webpage... I managed to extract the name and the prices but I have a problem here... Photo: https://i.sstatic.net/UhjE8.jpg

I wanna print all the <li></li> section but the numbers covered by <bold></bold> do not show up, why is this? I'm sure there is some way to print the whole thing.

I've been doing this: The original XPath is

//*[@id="ad-54132"]/div[2]/ul/li

Which I shortened (so that it prints all the ads no matter what number they are instead of just printing the "54132" ad) to:

squarefeet = tree.xpath('//*/div[2]/ul/li/text()')

And like i said in the beginning, it just prints the text that is not on <bold></bold>

Upvotes: 2

Answers (2)

Yogiraj Banerji

Reputation: 51

The following XPath will work:

//*[@id="ad-54132"]/div[2]/ul/li/*

The * at the end selects all the child nodes of the "li" tag

Upvotes: 0

har07

Reputation: 89295

By using li/text() you'll only get text nodes that is direct child of li.

To get all text nodes within li, no matter direct child or nested, you can use li//text(). But that will result in multiple text nodes for each li which you might don't want.

If you want to get all text nodes concatenated into single text for each li, you can call XPath string() or normalize-space() function for every li element like so :

squarefeet = [li.xpath('normalize-space(.)') for li in tree.xpath('//*/div[2]/ul/li')]

normalize-space() behaves just like string() in this case, plus it removes leading and trailing spaces if any, and it also replaces sequences of whitespace by a single whitespace.

Upvotes: 1

HTML Scraping XPath

Answers (2)

Related Questions