Chris
Chris

Reputation: 1317

Using ItemLoader but adding XPath, values etc. in Scrapy

Currently I'm using the XPathItemLoader to scrape data:

def parse_product(self, response):
    items = []
    l = XPathItemLoader(item=MyItem(), response=response)
    l.default_input_processor = MapCompose(lambda v: v.split(), replace_escape_chars)
    l.default_output_processor = Join()
    l.add_xpath('name', 'div[2]/header/h1/text()')
    items.append(l.load_item())
    return items

and needed the v.split() to get rid of some spaces - that's working fine.

But how can I add a time now?

l.add_value('time', time())

only results in an error:

exceptions.AttributeError: 'float' object has no attribute 'split'

Upvotes: 1

Views: 1184

Answers (1)

alecxe
alecxe

Reputation: 473903

This is because you are setting a default input and output processors which are applied for all item fields including time which is a float.

You have multiple options:

  • instead of default processors, use field-specific processors:

    l.name_in = MapCompose(lambda v: v.split(), replace_escape_chars)
    l.name_out = Join()
    
  • convert/format the time into string:

    l.add_value('time', str(time()))
    
  • leave the default processors as is and configure an Identity input and output processor:

    l.time_in = Identity()
    l.time_out = Identity()
    

Upvotes: 2

Related Questions