iw2fs
iw2fs

Reputation: 19

How should I choose input_processor and output_processor in scrapy, i don't see any different between them since they are going to pipeline eventually

In the documentation, it is said that the input processor processes the extracted data as soon as it’s received, when the output processor is called with the data previously collected (and processed using the input processor). The result of the output processor is the final value that gets assigned to the item.

How should I choose input_processor and output_processor in scrapy, I'm really confused now.

Also, is there any different between define processor in itemloader class or in field?

Upvotes: 2

Views: 414

Answers (1)

malberts
malberts

Reputation: 2536

The key difference is the input processor runs on each list of selected values separately, whereas the output processor runs on a list of all those values returned by the input processors. That distinction is not apparent when you're attaching only a single selector to a field. However, if you add multiple selectors (like in their example) you'll notice it. In other words, in a scenario like that you can only make a final decision on which value(s) to select when you have access to all the values.

Generally you would use input processors to do text preprocessing on the values (like changing case, stripping spaces, etc.), whereas the output processor is for selecting the final value(s).

Of course, you 're not required to define either if you don't have a specific need. A typical scenario would be to have no input processors and just a single TakeFirst output processor for when you're just selecting single values.

Also, while it is possible to perform that same text preprocessing in the output processor, it is better to keep things separate in case you plan on reusing processors.

Regarding where you define the processors: it affects the precedence order (as mentioned here) But most of that only really comes into play when you start reusing processors and loaders for different items and you want certain ones to be overridden. For a single item and a single loader there's no real practical difference.

Upvotes: 3

Related Questions