Kalyan
Kalyan

Reputation: 1940

Formatting json data fetched from URL by removing escape character

I have fetched json data from url and write it to in a file name urljson.json i want to format the json data removing '\' and result [] key for requirment purpose In my json file the data are arranged like this

{\"result\":[{\"BldgID\":\"1006AVE \",\"BldgName\":\"100-6th Avenue SW (Oddfellows)          \",\"BldgCity\":\"Calgary             \",\"BldgState\":\"AB \",\"BldgZip\":\"T2G 2C4  \",\"BldgAddress1\":\"100-6th Avenue Southwest                \",\"BldgAddress2\":\"ZZZ None\",\"BldgPhone\":\"4035439600     \",\"BldgLandlord\":\"1006AV\",\"BldgLandlordName\":\"100-6 TH Avenue SW Inc.                                     \",\"BldgManager\":\"AVANDE\",\"BldgManagerName\":\"Alyssa Van de Vorst           \",\"BldgManagerType\":\"Internal\",\"BldgGLA\":\"34242\",\"BldgEntityID\":\"1006AVE \",\"BldgInactive\":\"N\",\"BldgPropType\":\"ZZZ None\",\"BldgPropTypeDesc\":\"ZZZ None\",\"BldgPropSubType\":\"ZZZ None\",\"BldgPropSubTypeDesc\":\"ZZZ None\",\"BldgRetailFlag\":\"N\",\"BldgEntityType\":\"REIT                     \",\"BldgCityName\":\"Calgary             \",\"BldgDistrictName\":\"Downtown            \",\"BldgRegionName\":\"Western Canada                                    \",\"BldgAccountantID\":\"KKAUN     \",\"BldgAccountantName\":\"Kendra Kaun                   \",\"BldgAccountantMgrID\":\"LVALIANT  \",\"BldgAccountantMgrName\":\"Lorretta Valiant                        \",\"BldgFASBStartDate\":\"2012-10-24\",\"BldgFASBStartDateStr\":\"2012-10-24\"}]}

I want it like this format

[  
   {  
      "BldgID":"1006AVE",
      "BldgName":"100-6th Avenue SW (Oddfellows)          ",
      "BldgCity":"Calgary             ",
      "BldgState":"AB ",
      "BldgZip":"T2G 2C4  ",
      "BldgAddress1":"100-6th Avenue Southwest                ",
      "BldgAddress2":"ZZZ None",
      "BldgPhone":"4035439600     ",
      "BldgLandlord":"1006AV",
      "BldgLandlordName":"100-6 TH Avenue SW Inc.                                    ",
      "BldgManager":"AVANDE",
      "BldgManagerName":"Alyssa Van de Vorst           ",
      "BldgManagerType":"Internal",
      "BldgGLA":"34242",
      "BldgEntityID":"1006AVE ",
      "BldgInactive":"N",
      "BldgPropType":"ZZZ None",
      "BldgPropTypeDesc":"ZZZ None",
      "BldgPropSubType":"ZZZ None",
      "BldgPropSubTypeDesc":"ZZZ None",
      "BldgRetailFlag":"N",
      "BldgEntityType":"REIT                     ",
      "BldgCityName":"Calgary             ",
      "BldgDistrictName":"Downtown            ",
      "BldgRegionName":"Western Canada                                    ",
      "BldgAccountantID":"KKAUN     ",
      "BldgAccountantName":"Kendra Kaun                   ",
      "BldgAccountantMgrID":"LVALIANT  ",
      "BldgAccountantMgrName\":"      Lorretta Valiant                        ",
      "BldgFASBStartDate":"2012-10-24",
      "BldgFASBStartDateStr":"2012-10-24"
   }   `
]

i have tried replace("\","") but nothing changed Here is my code

import json


import urllib2
urllink=urllib2.urlopen("url").read()

print urllink -commented out



with open('urljson.json','w')as outfile:
    json.dump(urllink,outfile)


jsonfile='urljson.json'
jsondata=open(jsonfile)

data=json.load(jsondata)
data.replace('\'," ") --commented out
print (data)

but it is saying fileobject has no replace attribute, I didnt find any idea how to remove 'result' and most outer "{}" kindly guide me i think the file object is not parsed in string somehow .i am beginner in python thank you

Upvotes: 5

Views: 21270

Answers (3)

Mohammad Yusuf
Mohammad Yusuf

Reputation: 17074

Tidy up the JSON object before writing it to file. It has lot of whitespace noise. Try like this:

urllink = {a.strip():b.strip() for a,b in json.loads(urllink).values()[0][0].items()}
jsonobj = json.loads(json.dumps(urllink))

with open('urljson.json','w') as outfile:
    json.dump(jsonobj, outfile)

For all objects:

jsonlist = []

for dirtyobj in json.loads(urllink)['result']:
     jsonlist.append(json.loads(json.dumps({a.strip():b.strip() for a,b in dirtyobj.items()})))

with open('urljson.json','w') as outfile:
    json.dump(json.loads(json.dumps(jsonlist)), outfile)

Don't wanna tidy up? Then simply do this:

jsonobj = json.loads(urllink)

And you can't do '\', it's syntax error. The second ' is escaped and is not considered as closing quote.

data.replace('\'," ")

Why can't Python's raw string literals end with a single backslash?

Upvotes: 1

tdelaney
tdelaney

Reputation: 77407

JSON is a serialized encoding for data. urllink=urllib2.urlopen("url").read() read that serialized string. With json.dump(urllink,outfile) you serialized that single serialized JSON string again. You double-encoded it and that's why you see those extra "\" escape characters. json needs to escape those characters so as not to confuse them with the quotes it uses to demark strings.

If you wanted the file to hold the original json, you wouldn't need to encode it again, just do

with open('urljson.json','w')as outfile:
    outfile.write(urllink)

But it looks like you want to grab the "result" list and only save that. So, decode the JSON into python, grab the bits you want, and encode it again.

import json
import codecs
import urllib2

# read a json string from url
urllink=urllib2.urlopen("url").read()

# decode and grab result list
result = json.loads(urllink)['result']

# write the json to a file
with open('urljson.json','w')as outfile:
    json.dump(result, outfile)

Upvotes: 5

宏杰李
宏杰李

Reputation: 12168

\ is escape character in json:

enter image description here

you can load json string to python dict: enter image description here

Upvotes: 1

Related Questions