Asandra
Asandra

Reputation: 1

Striping text in scrappy

I'm trying to run spyder to extract real estate advertisements informaiton.

My code:

import scrapy
from ..items import RealestateItem


class AddSpider (scrapy.Spider):
    name = 'Add'
    start_urls = ['https://www.exampleurl.com/2-bedroom-apartment-downtown-4154251/']

    def parse(self, response):
        items = RealestateItem()

        whole_page = response.css('body')

        for item in whole_page:

            Title = response.css(".obj-header-text::text").extract()
            items['Title'] = Title
            yield items

After running in console:

scrapy crawl Add -o Data.csv

In .csv file I get

['\n            2-bedroom-apartment         ']

Tried adding strip method to function:

Title = response.css(".obj-header-text::text").extract().strip()

But scrapy returns:

Title = response.css(".obj-header-text::text").extract().strip()
AttributeError: 'list' object has no attribute 'strip'

Is there are some easy way to make scrapy return into .csv file just:

2-bedroom-apartment

Upvotes: 0

Views: 215

Answers (1)

renatodvc
renatodvc

Reputation: 2564

AttributeError: 'list' object has no attribute 'strip'

You get this error because .extract() returns a list, and .strip() is a method of string.


If that selector always returns ONE item, you could replace it with .get() [or extract_first()] instead of .extract(), this will return a string of the first item, instead of a list. Read more here.

If you need it to return a list, you can loop through the list, calling strip in each item like:

title = response.css(".obj-header-text::text").extract()
title = [item.strip() for item in title]

You can also use an XPath selector, instead of a CSS selector, that way you can use normalize-space to strip whitespace.

title = response.xpath('normalize-space(.//*[@class="obj-header-text"]/text())').extract()

This XPath may need some adjustment, as you didn't post the source I couldn't check it

Upvotes: 1

Related Questions