Default value/dealing with empty values when using the ItemLoader in Scrapy

Question

I'm using the itemloader to process my scraped data, and, in order to maintain the structure and integrity of the original data, and to allow for easy database insertion, I need to store the empty values that my XPaths sometimes come up with.

The problem is, however, that there seems to be no simple way of doing this using the itemloader, as None-types don't even seem to reach the input processor.

For simplicity, consider trying to add two values of type None to an item like follows:

loader.add_value('name', None)
loader.add_value('name', None)

The item will not be affected at all by these two lines. This is not the behavior I want. Instead, I would like there to be two (new) elements in item['name'] like ["",""]

I modified the _add_value() and load_item() methods of the ItemLoader class like this:

def _add_value(self, field_name, value):
    value = arg_to_iter(value)
    processed_value = self._process_input_value(field_name, value)
    self._values.setdefault(field_name, [])
    self._values[field_name] += arg_to_iter(processed_value)

def load_item(self):
        adapter = ItemAdapter(self.item)
        for field_name in tuple(self._values):
            value = self.get_output_value(field_name)
            if value:
                adapter[field_name] = value
            else: 
                adapter[field_name] = "NA"
        return adapter.item

This at least prevents the empty fields, but I have no idea what problems might arise from doing this, and it doesn't really solve my problem, since I want to store all empty data.

One solution is of course to simply not use the itemloader, and instead just check if the value of response.xpath() is null. However,that would cause my project to become a lot messier, which I would like to avoid if possible.

Any ideas?

Default value/dealing with empty values when using the ItemLoader in Scrapy

Answers (1)

Related Questions