Passing scraped data in piplines __init__ scrapy for python

Question

I am trying to pass the items that contain the title data to my piplines. Is there a way to this inside the parse because the data gets reset for the next page. I tried super(mySpider,self).__init__(*args,*kwargs) but data is not sent correctly. I need to get the title of the webpage as the filename so thats why I need the specific item in there.

Something like this.

   def __init__(self, item):

      self.csvwriter = csv.writer(open(item['title'][0]+'.csv', 'wb'), delimiter=',')
      self.csvwriter.writerow(['Name','Date','Location','Stars','Subject','Comment','Response','Title'])

William Kinaan · Accepted Answer

The input for any pipeline is your Item. In your case, you would need to pass the name (or any other data) in your Item. Then, you should write a pipeline to write that item to file system (or database or you can do what ever you want).

Sample code

Let's say your new pipeline is named 'NewPipeline' and is located inside the main root of your scrapy project.

In your setting, you would need to define that pipeline as this:

ITEM_PIPELINES = {
    'YourRootDirectory.NewPipleline.NewPipeline':800
#add any other pipelines you have
}

And your pipeline should be like this:

class NewPipeline(object):
    def process_item(self, item, spider):
        name = item['name']
        self.file = open("pathToWhereYouWantToSave"+ name, 'wb')
        line = json.dumps(dict(item)) #change the item to a json format in one line
        self.file.write(line)#write the item to the file

Note

You can put your pipeline in any other modules.

Passing scraped data in piplines init scrapy for python

Answers (2)

Sample code

Note

Related Questions