Reputation: 943
I am having a problem with python and the Scrappy library. When this code:
self.item['char_SP4_TIP'] = response.xpath('//p[contains(@class, "spell-tooltip")]/text()').extract()
runs, it extracts the text from the paragraph but it splits it by the <br> tags
.
So instead of being able to access it like: self.item['char_SP4_TIP'][0]
, I have to access [0][1][2]
etc.. for however many <br>
tags there are. Is there any way to fix it so it does not split it by the <br>
tags? Thanks.
Upvotes: 3
Views: 3008
Reputation: 4002
Your xpath selects all text nodes, but a <br>
is not a text node.
<p class='spell-description'> blah <br><br> blah2 </p>
Selects these ^^^^ ^^^^^
You can join
the split text.
texts = response.xpath('//p[contains(@class, "spell-tooltip")]/text()').extract()
text = '\n'.join(texts)
If there are multiple <p>
tags with that class:
text = ['\n'.join(p.xpath('/text()').extract())
for p in response.xpath('//p[contains(@class, "spell-tooltip")]')]
Upvotes: 3