Dpitt1968
Dpitt1968

Reputation: 109

Modify json in the scrapy pipeline

If I have a json/dictionary (in a scrapy pipeline), how would I go about adding everything to a key in the beginning and getting rid of the brackets?

[
  {
    "date":"2015-11-25",
    "threat_level_id":"1",
    "info":"TEST",
    "analysis":"0",
    "distribution":"0",
    "orgc":"Malware, Inc",
    "Attribute":[
      {
        "type":"md5",
        "category":"Payload delivery",
        "to_ids":true,
        "distribution":"3",
        "value":"35b759347aee663e36f5b91877749349"
      }
    ]
  }
]

and I want to add a key to the beginning of it and get rid of the brackets to make it look like this -

{
  "Event":{
    "date":"2015-11-25",
    "threat_level_id":"1",
    "info":"TEST",
    "analysis":"0",
    "distribution":"0",
    "orgc":"Oxygen",
    "Attribute":[
      {
        "type":"md5",
        "category":"Payload delivery",
        "to_ids":true,
        "distribution":"3",
        "value":"35b759347aee663e36f5b91877749349"
      }
    ]
  }
}

Thanks natdempk!

I'm getting exceptions.TypeError: expected string or buffer on this -

class JsonPipeline(object):
    def process_item(self, item, spider):
        data = json.loads(item)
        new_data = {}
        new_data['Event'] = data
        item = json.dumps(data)
        return item

I'm running the scrapy crawler like this - scrapy crawl spider -o items.json

This works but I get the error File "/usr/lib/pymodules/python2.7/scrapy/contrib/exporter/init.py", line 71, in _get_serialized_fields field = item.fields[field_name] exceptions.AttributeError: 'dict' object has no attribute 'fields'

class JsonWithEncodingPipeline(object):
    def process_item(self, item, spider):
        data = {}
        data['Event'] = item
        return data

If I add this to settings.py it works but I don't get a file output?? :(

EXTENSIONS = {'scrapy.contrib.feedexport.FeedExporter': None}

Is there a way to do this without disabling FEEDEXPORTER?

Upvotes: 2

Views: 1883

Answers (1)

Nat Dempkowski
Nat Dempkowski

Reputation: 2440

You can use Python's json module to read json in to a dictionary, then modify that dictionary and export it again as json.

This might look something like:

import json

data = json.loads(your_json_data_as_string)

new_data = {}
new_data['Event'] = data

new_json_string = json.dumps(new_data)

This would produce something like your desired example, where it puts the whole given json data structure under the key Event.

Upvotes: 2

Related Questions