Reputation: 1079
Here is the html string in question.
<div class="def ddef_d db">a <a class="query" href="https://dictionary.cambridge.org/us/dictionary/english/book" title="book">book</a> of grammar <a class="query" href="https://dictionary.cambridge.org/us/dictionary/english/rule" title="rules">rules</a>: </div>
With BeautifulSoup, this code
from bs4 import BeautifulSoup
soup = BeautifulSoup(htmltxt, 'lxml')
soup.text
gets me
a book of grammar rules:
which is exactly what I want.
With scrapy, how do I get the same result?
from scrapy import Selector
sel = Selector(text=htmltxt)
sel.css('.ddef_d::text').getall()
this code gets me
['a ', ' of grammar ', ': ']
How should I fix it?
Upvotes: 0
Views: 268
Reputation: 1933
aYou can use this code to get all text inside div and its child:
text = ''.join(sel.css('.ddef_d ::text').getall())
print(text)
your selector returns text only from the div, but part of text located inside child elements (a), that's why you have to add space before ::text
to include child text into result.
Upvotes: 1