Reputation: 3727
So I have an item['html']
field that is needed for MyExamplePipeline
, but after processing it isn't needed to store into a database with i.e, MongoDBPipeline
. Is there a way in scrapy to just drop the field html
and keep the rest of the item? It's needed as part of the item to pass the page html
from the spider to the pipeline, but I'm not able to figure out how to drop it. I looked in this SO post that mentioned using FEED_EXPORT_FIELDS OR fields_to_export, but the problem is that I don't want to use an item exporter, I just want to feed the item into the next MongoDBPipeline
. Is there a way to do this in Scrapy? Thanks!
Upvotes: 0
Views: 1158
Reputation: 146510
You can easily do that. In your MongoDBPipeline
you need to do something like below
del item['html']
If that impacts the item in another pipeline then use copy.deepcopy
and create a copy of item object and then delete html
before inserting into mongodb
Upvotes: 3