Liao Zhuodi
Liao Zhuodi

Reputation: 3283

parse special json format with python

I want to get GPSLatitude and GPSLongitude value, but I can't use python position because the position is pretty random. I get the value by tag's value, how can I do that?

jsonFlickrApi({ "photo": { "id": "8566959299", "secret": "141af38562", "server": "8233", "farm": 9, "camera": "Apple iPhone 4S", 
    "exif": [
      { "tagspace": "JFIF", "tagspaceid": 0, "tag": "JFIFVersion", "label": "JFIFVersion", 
        "raw": { "_content": 1.01 } },
      { "tagspace": "JFIF", "tagspaceid": 0, "tag": "ResolutionUnit", "label": "Resolution Unit", 
        "raw": { "_content": "inches" } },
      { "tagspace": "JFIF", "tagspaceid": 0, "tag": "XResolution", "label": "X-Resolution", 
        "raw": { "_content": 72 }, 
        "clean": { "_content": "72 dpi" } },
      { "tagspace": "JFIF", "tagspaceid": 0, "tag": "YResolution", "label": "Y-Resolution", 
        "raw": { "_content": 72 }, 
        "clean": { "_content": "72 dpi" } },
      { "tagspace": "GPS", "tagspaceid": 0, "tag": "GPSLatitudeRef", "label": "GPS Latitude Ref", 
        "raw": { "_content": "North" } },
      { "tagspace": "GPS", "tagspaceid": 0, "tag": "GPSLatitude", "label": "GPS Latitude", 
        "raw": { "_content": "39 deg 56' 44.40\"" }, 
        "clean": { "_content": "39 deg 56' 44.40\" N" } },
      { "tagspace": "GPS", "tagspaceid": 0, "tag": "GPSLongitudeRef", "label": "GPS Longitude Ref", 
        "raw": { "_content": "East" } },
      { "tagspace": "GPS", "tagspaceid": 0, "tag": "GPSLongitude", "label": "GPS Longitude", 
        "raw": { "_content": "116 deg 16' 10.20\"" }, 
        "clean": { "_content": "116 deg 16' 10.20\" E" } },
    ] }, "stat": "ok" })

Upvotes: 0

Views: 531

Answers (3)

mhawke
mhawke

Reputation: 87054

You don't say whether you using one of the Flickr APIs; I assume not because handling JSON responses is trivial if you are using an API such as flickrapi.

import flickrapi

api_key = '88341066e8f0a40516599d28d8170627'   # from flickr's API explorer
secret = 'sssshhhh'
flickr = flickrapi.FlickrAPI(api_key, secret, format='parsed-json')
response = flickr.photos.getExif(photo_id='8566959299')
lat_long = {exif['tag']: exif['clean']['_content']
                    for exif in response['photo']['exif']
                        if exif['tag'] in (u'GPSLongitude', u'GPSLatitude')}

>>> from pprint import pprint
>>> pprint(lat_long)
{u'GPSLatitude': u'39 deg 56\' 44.40" N',
 u'GPSLongitude': u'116 deg 16\' 10.20" E'}

But continuing with the assumption that you are not using an API, the response format that you are seeing is actually JSONP which is better suited to Javascript than it is Python. You can, however, request a response in JSON representation that does not have the enclosing jsonFlickrApi() function wrapper. Do this by specifying format=json&nojsoncallback=1 in the query parameters of the request. Using the requests library makes requesting and parsing the JSON response easy, but this will work just as well with urllib2.urlopen() combined with json.loads() if you can't use requests e.g.

import requests

params = {'api_key': '88341066e8f0a40516599d28d8170627',
          'api_sig': '7b2dcfb2cd3a747179c2ed0fdc492699',
          'format': 'json',
          'method': 'flickr.photos.getExif',
          'nojsoncallback': '1',
          'photo_id': '8566959299',
          'secret': 'sssshhhh'}    
response = requests.get('https://api.flickr.com/services/rest/', params=params)
data = response.json()
lat_long = {exif['tag']: exif['clean']['_content']
                for exif in data['photo']['exif']
                    if exif['tag'] in (u'GPSLongitude', u'GPSLatitude')}

>>> from pprint import pprint
>>> pprint(lat_long)
{u'GPSLatitude': u'39 deg 56\' 44.40" N',
 u'GPSLongitude': u'116 deg 16\' 10.20" E'}

Upvotes: 3

Oliver W.
Oliver W.

Reputation: 13459

With the exception of the last comma just before the closing bracket ], the entire object returned by the FlickrAPI is valid json.

Assuming that that comma is merely a copy-paste error (example evidence suggests this is the case), then the builtin json module still won't be usable as is. That's because even though a string like "116 deg 16' 10.20\" E" is valid json, python's json module will complain with a ValueError because the double quote " isn't sufficiently quoted:

>>> import json
>>> json.loads('{"a": "2"}')
{u'a': u'2'}
>>> json.loads('{"a": "2\""}')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/json/__init__.py", line 338, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python2.7/json/decoder.py", line 365, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python2.7/json/decoder.py", line 381, in raw_decode
    obj, end = self.scan_once(s, idx)
ValueError: Expecting , delimiter: line 1 column 10 (char 9)

The solution is to add another escaping backslash:

>>> json.loads('{"a": "2\\""}')
{u'a': u'2"'}

For your full jsonFlickrApi response, you could add those extra backslashes with the re module:

>>> import re
>>> response = """jsonFlickrApi({ "photo": { "id": "8566959299", "secret": "141af38562", "server": "8233", "farm": 9, "camera": "Apple iPhone 4S", 
...     "exif": [
...       { "tagspace": "JFIF", "tagspaceid": 0, "tag": "JFIFVersion", "label": "JFIFVersion", 
...         "raw": { "_content": 1.01 } },
...       { "tagspace": "JFIF", "tagspaceid": 0, "tag": "ResolutionUnit", "label": "Resolution Unit", 
...         "raw": { "_content": "inches" } },
...       { "tagspace": "JFIF", "tagspaceid": 0, "tag": "XResolution", "label": "X-Resolution", 
...         "raw": { "_content": 72 }, 
...         "clean": { "_content": "72 dpi" } },
...       { "tagspace": "JFIF", "tagspaceid": 0, "tag": "YResolution", "label": "Y-Resolution", 
...         "raw": { "_content": 72 }, 
...         "clean": { "_content": "72 dpi" } },
...       { "tagspace": "GPS", "tagspaceid": 0, "tag": "GPSLatitudeRef", "label": "GPS Latitude Ref", 
...         "raw": { "_content": "North" } },
...       { "tagspace": "GPS", "tagspaceid": 0, "tag": "GPSLatitude", "label": "GPS Latitude", 
...         "raw": { "_content": "39 deg 56' 44.40\"" }, 
...         "clean": { "_content": "39 deg 56' 44.40\" N" } },
...       { "tagspace": "GPS", "tagspaceid": 0, "tag": "GPSLongitudeRef", "label": "GPS Longitude Ref", 
...         "raw": { "_content": "East" } },
...       { "tagspace": "GPS", "tagspaceid": 0, "tag": "GPSLongitude", "label": "GPS Longitude", 
...         "raw": { "_content": "116 deg 16' 10.20\"" }, 
...         "clean": { "_content": "116 deg 16' 10.20\" E" } }
...     ] }, "stat": "ok" })"""
>>> quoted_resp = re.sub('deg ([^"]+)"', r'deg \1\\"', response[14:-1])

That quoted response can then be used in a call to json.loads and you can then easily access the required data in the newly generated dictionary structure:

>>> photodict = json.loads(quoted_resp)
>>> for meta in photodict['photo']['exif']:                                                                                                               
...     if meta["tagspace"] == "GPS" and meta["tag"] == "GPSLongitude":
...         print(meta["clean"]["_content"])
... 
116 deg 16' 10.20" E

Upvotes: 0

Meng Wang
Meng Wang

Reputation: 277

If looking at the whole string as jsonFlickrApi(XXX), XXX is a standard JSON string. With json library, XXX can be converted to python dictionary and then parsed easily.

Upvotes: 0

Related Questions