Can't get Scrapy to return text in Div

Question

I'm having trouble getting scrapy to return the text from this div. When it does return data its considerably more than what I thought it would return.

Target HTML:

12:00 PM - 09:00 PM

Attempt 1:

    def parse_schedule(self, response):
    s_item = ScheduleItem()

    for sel in response.xpath("//div[@class='DivTimeSpan']"):
        s_item['schedule'] = sel.select('//text()').extract()
    return s_item

Returns:

" ", " ", " var allowedUrls = []; allowedUrls.push("Login.net"); allowedUrls.push("Login"); allowedUrls.push("AccountLogin.net"); allowedUrls.push("AccountLogin"); allowedUrls.push("CreateAccount"); allowedUrls.push("CreateAccount.net"); allowedUrls.push("UpdateAccount"); allowedUrls.push("UpdateAccount.net"); allowedUrls.push("CreateResellersAccount"); allowedUrls.push("CreateResellersAccount.net"); allowedUrls.push("CreateQqestSAASAccount");
"11:00 AM - 09:00 PM", "12:00 PM - 09:00 PM", "12:00 PM - 09:00 PM", "12:00 PM - 09:00 PM", "12:00 PM - 09:00 PM"

The entire file is probably thousands of lines long and contains what looks like text from outside of the div I specified

I understood //text() to return the text of the element and its children. The html element I'm targeting doesn't have any children though so I assumed it would only return the data in the div.

Next I tried just using "/text()". This was the only change

Attempt 2:

    for sel in response.xpath("//div[@class='DivTimeSpan']"):
        s_item['schedule'] = sel.select('/text()').extract()
    return s_item

Returns:

[{"schedule": []}]

Desired Result:

[{"schedule": ["11:00 AM - 09:00 PM", "12:00 PM - 09:00 PM", "12:00 PM - 09:00 PM", "12:00 PM - 09:00 PM", "12:00 PM - 09:00 PM"]}]

The url I'm scraping is behind a company login so I can't give out the actual url.

Elisha's post lead me in the right direction, Thanks!!! :) Answer:

for sel in response.xpath("//div[@class='DivTimeSpan']"):
    s_item['schedule'] = map(unicode.strip,    sel.select('//div/text()').extract())
return s_item

Elisha · Accepted Answer

The second attempt is closer to extracting the value. Yet, you need to extract the text from the node, and not from the document root:

s_item['schedule'] = sel.select('/div/text()').extract()[0]

In case the document contains more tags (which are not divs), you can try:

s_item['schedule'] = sel.select('//div/text()').extract()[0]

Can't get Scrapy to return text in Div

Answers (1)

Related Questions

Can&#39;t get Scrapy to return text in Div

Answers (1)

Related Questions

Can't get Scrapy to return text in Div