Zieng
Zieng

Reputation: 451

can one spider handle multiple items and multiple pipelines?

New to scrapy.There are something confused me:what's the relationship between spiders,pipelines and items?

1.should one pipeline handle only one specific item or it can handle multiple items?

2.how to use one spider to crawl multiple items or I should use one spider just to crawl one item?

Upvotes: 1

Views: 757

Answers (1)

Elias Dorneles
Elias Dorneles

Reputation: 23866

Item refers to an item of data that it's scraped. You can also call it a record or an entry.

Spider is the thing that does crawling (starting requests and following links) and scraping (extracting data items from responses). They can schedule whatever amount of requests and extract whatever amount of items as you want, there isn't any limit.

Item pipelines are an abstraction to process the items that are extracted by a spider. The idea is that you can combine different "pipes" through which the data items will come through, and then you'll arrange them in a way that will accomplish whatever you need. Examples of use cases for pipelines are applying validation constraints, saving data into a database, doing some clean-up on the data (e.g., remove HTML tags), etc.

So, recapping:

Spiders extract data items, which Scrapy send one by one to a configured item pipeline (if there is possible) to do post-processing on the items.

Upvotes: 1

Related Questions