V.Khakhil
V.Khakhil

Reputation: 265

Extract nested tags with other text data as string in scrapy

I am trying to extract data. Here is the specific part of the html-

     <div class="readable">

        <span id="freeTextContainer2123443890291117716">I write because I need to. <br>I review because I want to. 
    <br>I pay taxes because I have to. 
    <br><br>If you want to follow me, my username is @colleenhoover pretty much everywhere except my email, which is [email protected]
    <br><br>Founder of 
<a target="_blank" href="http://www.thebookwormbox.com" rel="nofollow">www.thebookwormbox.com</a> 
<br><br></span>

    </div>

I want output like this-

    I write because I need to.
    I review because I want to.
    I pay taxes because I have to.

    If you want to follow me, my username is @colleenhoover pretty much everywhere except my email, which is [email protected] 
Founder of www.thebookwormbox.com 

I am trying this-

aboutauthor=response.xpath('//div[@id="aboutAuthor"]/div[@class="bigBoxBody"]/div[@class="bigBoxContent containerWithHeaderContent"]/div[@class="readable"]/span[1]/text()').extract() if len(response.xpath('//div[@id="aboutAuthor"]/div[@class="bigBoxBody"]/div[@class="bigBoxContent containerWithHeaderContent"]/div[@class="readable"]/span')) == 1 else  response.xpath('//div[@id="aboutAuthor"]/div[@class="bigBoxBody"]/div[@class="bigBoxContent containerWithHeaderContent"]/div[@class="readable"]/span[2]/text()').extract()
    print aboutauthor

And get the output-

[u'I write because I need to. ', u'I review because I want to. ', u'I pay taxes
because I have to. ', u'If you want to follow me, my username is @colleenhoover
pretty much everywhere except my email, which is [email protected]',
u'Founder of ', u' ']

What i do so that i get www.thebookwormbox.com with the output ?

Upvotes: 1

Views: 1019

Answers (1)

Anzel
Anzel

Reputation: 20553

As per my comment, you can use xpath with //text() to get all the children's text content.

Upvotes: 3

Related Questions