Reputation: 15
I'm trying to scrape some data from a webpage... I managed to extract the name and the prices but I have a problem here... Photo: https://i.sstatic.net/UhjE8.jpg
I wanna print all the <li></li>
section but the numbers covered by <bold></bold>
do not show up, why is this? I'm sure there is some way to print the whole thing.
I've been doing this: The original XPath is
//*[@id="ad-54132"]/div[2]/ul/li
Which I shortened (so that it prints all the ads no matter what number they are instead of just printing the "54132" ad) to:
squarefeet = tree.xpath('//*/div[2]/ul/li/text()')
And like i said in the beginning, it just prints the text that is not on <bold></bold>
Upvotes: 2
Views: 228
Reputation: 51
The following XPath will work:
//*[@id="ad-54132"]/div[2]/ul/li/*
The * at the end selects all the child nodes of the "li" tag
Upvotes: 0
Reputation: 89295
By using li/text()
you'll only get text nodes that is direct child of li
.
To get all text nodes within li
, no matter direct child or nested, you can use li//text()
. But that will result in multiple text nodes for each li
which you might don't want.
If you want to get all text nodes concatenated into single text for each li
, you can call XPath string()
or normalize-space()
function for every li
element like so :
squarefeet = [li.xpath('normalize-space(.)') for li in tree.xpath('//*/div[2]/ul/li')]
normalize-space()
behaves just like string()
in this case, plus it removes leading and trailing spaces if any, and it also replaces sequences of whitespace by a single whitespace.
Upvotes: 1