saby
saby

Reputation: 361

Extract URLs from JSON conditionaly

please have multiple JSON files, which contain URLs to images. There are three formats for each image:

  1. SD version - standard quality ("isThumbNail": false, "isHdImage": false)
  2. thumbnail - lowest quality ("isThumbNail": true, "isHdImage": false)
  3. HD - highest quality ("isThumbNail": false, "isHdImage": true)

It looks like this:

{
  "objectId": 1234,
  "imgCount": 3,
  "dataImages": [
    {
      "sequence": 1,
      "link": [
        {
          "url": "http://example.com/SD/image_1.jpg",
          "isThumbNail": false,
          "isHdImage": false
        },
        {
          "url": "http://example.com/THUMB/image_1.jpg",
          "isThumbNail": true,
          "isHdImage": false
        },
        {
          "url": "http://example.com/HD/image_1.jpg",
          "isThumbNail": false,
          "isHdImage": true
        }
      ]
    },
    {
      "sequence": 2,
      "link": [
        {
          "url": "http://example.com/SD/image_2.jpg",
          "isThumbNail": false,
          "isHdImage": false
        },
        {
          "url": "http://example.com/THUMB/image_2.jpg",
          "isThumbNail": true,
          "isHdImage": false
        },
        {
          "url": "http://example.com/HD/image_2.jpg",
          "isThumbNail": false,
          "isHdImage": true
        }
      ]
    },
    {
      "sequence": 3,
      "link": [
        {
          "url": "http://example.com/SD/image_3.jpg",
          "isThumbNail": false,
          "isHdImage": false
        },
        {
          "url": "http://example.com/THUMB/image_3.jpg",
          "isThumbNail": true,
          "isHdImage": false
        },
        {
          "url": "http://example.com/HD/image_3.jpg",
          "isThumbNail": false,
          "isHdImage": true
        }
      ]
    }
  ]
}

I am trying to get an HD version of the image's URL and append it to images list. It may happen, that there is no HD version of the image, so if it's not present in JSON, I want to download an SD version. And of course, it may also happen, that there will be only a thumbnail version of the image or no image at all - so it should return some empty value, or something safe, that will not break the program.

With this code, I am able to get all isHdImage:

def get_images(url):
    try:
        images = []
        response = requests.get(url)
        response.raise_for_status()
        data = response.json()

        for sequence in data['lotImages']:
            for link in sequence['link']:
                if link['isHdImage'] is True:
                    images.append(['url'])
                    
        return images

    except requests.exceptions.HTTPError as err:
        print('HTTPError:', err)

But I am not sure, how I can reach a solution, which I have described above. Thank you for any advice.

Upvotes: 2

Views: 164

Answers (1)

Ax34
Ax34

Reputation: 123

you could check for every image the combinations, if I understand correctly: if the flags isHdImage and isThumbNail are both false the image is a SD version, if only isHdImage is true then it's and HD, if only isThumbNail is true then it's a thumbnail, so you can do something like this:

def get_images(url):
    resolution_order = ['HD', 'SD', 'TH'] #the less index the better
    try:
        images = []
        #get the last item in the resolution_order list to
        #get the worse resolution, so it will always get the
        #better resolution as soon as it finds out one
        best = [resolution_order[-1], ""]
        response = requests.get(url)
        response.raise_for_status()
        data = response.json()

        for sequence in data['lotImages']:
            for link in sequence['link']:

                if link['isHdImage'] and not link['isThumbNail']:
                    #it's an HD image
                    if resolution_order.index('HD') < resolution_order.index(best[0]):
                        best = ['HD', link['url']]

                elif not link['isHdImage'] and link['isThumbNail']:
                    #it's a thumbnail
                    if resolution_order.index('TH') < resolution_order.index(best[0]):
                        best = ['TH', link['url']]

                elif not link['isHdImage'] and not link['isThumbNail']:
                    #it's a SD image
                    if resolution_order.index('SD') < resolution_order.index(best[0]):
                        best = ['SD', link['url']]

            images.append(best[1]) #append the best url to the images
                    
        return images

    except requests.exceptions.HTTPError as err:
        print('HTTPError:', err)

Explanation: we cycle into every link of every sequence in the json, then we assign to the best array at position 0 the resolution, and at the position 1 the corresponding link. We have the resolution_order array that indicates the order of the resolution to download from.

For example, if the script first find an SD image it will assign to best the value ['SD', 'URL'], the index of 'SD' in resolution_order is 1, then it finds and HD image, then when it checks for resolution_order.index('HD') < resolution_order.index(best[0]) it will return True only if the index of the value HD in resolution _order is less than the resolution of the position 0 in the best array, that in this case is (as said before) 1, the value of the index of 'HD' is 0 then best is replaced with ['HD', 'NEW HD URL'], so even if the resolution are not ordered you can still have the best quality based on the resolution_oder

Upvotes: 1

Related Questions