Reputation: 265
I am trying to extract data. Here is the specific part of the html-
<div class="readable">
<span id="freeTextContainer2123443890291117716">I write because I need to. <br>I review because I want to.
<br>I pay taxes because I have to.
<br><br>If you want to follow me, my username is @colleenhoover pretty much everywhere except my email, which is [email protected]
<br><br>Founder of
<a target="_blank" href="http://www.thebookwormbox.com" rel="nofollow">www.thebookwormbox.com</a>
<br><br></span>
</div>
I want output like this-
I write because I need to.
I review because I want to.
I pay taxes because I have to.
If you want to follow me, my username is @colleenhoover pretty much everywhere except my email, which is [email protected]
Founder of www.thebookwormbox.com
I am trying this-
aboutauthor=response.xpath('//div[@id="aboutAuthor"]/div[@class="bigBoxBody"]/div[@class="bigBoxContent containerWithHeaderContent"]/div[@class="readable"]/span[1]/text()').extract() if len(response.xpath('//div[@id="aboutAuthor"]/div[@class="bigBoxBody"]/div[@class="bigBoxContent containerWithHeaderContent"]/div[@class="readable"]/span')) == 1 else response.xpath('//div[@id="aboutAuthor"]/div[@class="bigBoxBody"]/div[@class="bigBoxContent containerWithHeaderContent"]/div[@class="readable"]/span[2]/text()').extract()
print aboutauthor
And get the output-
[u'I write because I need to. ', u'I review because I want to. ', u'I pay taxes
because I have to. ', u'If you want to follow me, my username is @colleenhoover
pretty much everywhere except my email, which is [email protected]',
u'Founder of ', u' ']
What i do so that i get www.thebookwormbox.com
with the output ?
Upvotes: 1
Views: 1019
Reputation: 20553
As per my comment, you can use xpath with //text()
to get all the children's text content.
Upvotes: 3