ocean800
ocean800

Reputation: 3727

Scrapy - Drop item field in pipeline ?

So I have an item['html'] field that is needed for MyExamplePipeline, but after processing it isn't needed to store into a database with i.e, MongoDBPipeline. Is there a way in scrapy to just drop the field html and keep the rest of the item? It's needed as part of the item to pass the page html from the spider to the pipeline, but I'm not able to figure out how to drop it. I looked in this SO post that mentioned using FEED_EXPORT_FIELDS OR fields_to_export, but the problem is that I don't want to use an item exporter, I just want to feed the item into the next MongoDBPipeline. Is there a way to do this in Scrapy? Thanks!

Upvotes: 0

Views: 1158

Answers (1)

Tarun Lalwani
Tarun Lalwani

Reputation: 146510

You can easily do that. In your MongoDBPipeline you need to do something like below

del item['html']

If that impacts the item in another pipeline then use copy.deepcopy and create a copy of item object and then delete html before inserting into mongodb

Upvotes: 3

Related Questions