Mariana Rodrigues
Mariana Rodrigues

Reputation: 195

Select specific keys inside a json using python

I have the following json that I extracted using request with python and json.loads. The whole json basically repeats itself with changes in the ID and names. It has a lot of information but I`m just posting a small sample as an example:

"status":"OK",
   "statuscode":200,
   "message":"success",
   "apps":[
      {
         "id":"675832210",
         "title":"AGED",
         "desc":"No annoying ads&easy to play",
         "urlImg":"https://test.com/pImg.aspx?b=675832&z=1041813&c=495181&tid=API_MP&u=https%3a%2f%2fcdna.test.com%2fbanner%2fwMMUapCtmeXTIxw_square.png&q=",
         "urlImgWide":"https://cdna.test.com/banner/sI9MfGhqXKxVHGw_rectangular.jpeg",
         "urlApp":"https://admin.test.com/appLink.aspx?b=675832&e=1041813&tid=API_MP&sid=2c5cee038cd9449da35bc7b0f53cf60f&q=",
         "androidPackage":"com.agedstudio.freecell",
         "revenueType":"cpi",
         "revenueRate":"0.10",
         "categories":"Card",
         "idx":"2",
         "country":[
            "CH"
         ],
         "cityInclude":[
            "ALL"
         ],
         "cityExclude":[
            
         ],
         "targetedOSver":"ALL",
         "targetedDevices":"ALL",
         "bannerId":"675832210",
         "campaignId":"495181210",
         "campaignType":"network",
         "supportedVersion":"",
         "storeRating":"4.3",
         "storeDownloads":"10000+",
         "appSize":"34603008",
         "urlVideo":"",
         "urlVideoHigh":"",
         "urlVideo30Sec":"https://cdn.test.com/banner/video/video-675832-30.mp4?rnd=1620699136",
         "urlVideo30SecHigh":"https://cdn.test.com/banner/video/video-675832-30_o.mp4?rnd=1620699131",
         "offerId":"5825774"
      },

I dont need all that data, just a few like 'title', 'country', 'revenuerate' and 'urlApp' but I dont know if there is a way to extract only that. My solution so far was to make the json a dataframe and then drop the columns, however, I wanted to find an easier solution.

My ideal final result would be to have a dataframe with selected keys and arrays Does anybody know an easy solution for this problem?

Thanks

Upvotes: 0

Views: 12791

Answers (3)

use a dictionary comprehension to extract a dictionary of key/value pairs you want

import json

json_string="""{
     "id":"675832210",
     "title":"AGED",
     "desc":"No annoying ads&easy to play",
     "urlApp":"https://admin.test.com/appLink.aspx?b=675832&e=1041813&tid=API_MP&sid=2c5cee038cd9449da35bc7b0f53cf60f&q=",
     "revenueRate":"0.10",
     "categories":"Card",
     "idx":"2",
     "country":[
        "CH"
     ],
     "cityInclude":[
        "ALL"
     ],
     "cityExclude":[
        
     ]
  }"""

 json_dict = json.loads(json_string)

 filter_fields=['title','country','revenueRate','urlApp']    

 dict_result = { key: json_dict[key]  for key in json_dict if key in filter_fields}

 json_elements = []
 for key in dict_result:
     json_elements.append((key,json_dict[key]))

 print(json_elements)

output:

 [('title', 'AGED'), ('urlApp', 'https://admin.test.com/appLink.aspx?b=675832&e=1041813&tid=API_MP&sid=2c5cee038cd9449da35bc7b0f53cf60f&q='), ('revenueRate', '0.10'), ('country', ['CH'])]

Upvotes: 0

schilli
schilli

Reputation: 1848

I assume you have that data as a dictionary, let's call it json_data. You can just iterate over the apps and write them into a list. Alternatively, you could obviously also define a class and initialize objects of that class.

EDIT: I just found this answer: https://stackoverflow.com/a/20638258/6180150, which tells how you can convert a list of dicts like from my sample code into a dataframe. See below adaptions to the code for a solution.

json_data = {
    "status": "OK",
    "statuscode": 200,
    "message": "success",
    "apps": [
        {
            "id": "675832210",
            "title": "AGED",
            "desc": "No annoying ads&easy to play",
            "urlImg": "https://test.com/pImg.aspx?b=675832&z=1041813&c=495181&tid=API_MP&u=https%3a%2f%2fcdna.test.com%2fbanner%2fwMMUapCtmeXTIxw_square.png&q=",
            "urlImgWide": "https://cdna.test.com/banner/sI9MfGhqXKxVHGw_rectangular.jpeg",
            "urlApp": "https://admin.test.com/appLink.aspx?b=675832&e=1041813&tid=API_MP&sid=2c5cee038cd9449da35bc7b0f53cf60f&q=",
            "androidPackage": "com.agedstudio.freecell",
            "revenueType": "cpi",
            "revenueRate": "0.10",
            "categories": "Card",
            "idx": "2",
            "country": [
                "CH"
            ],
            "cityInclude": [
                "ALL"
            ],
            "cityExclude": [

            ],
            "targetedOSver": "ALL",
            "targetedDevices": "ALL",
            "bannerId": "675832210",
            "campaignId": "495181210",
            "campaignType": "network",
            "supportedVersion": "",
            "storeRating": "4.3",
            "storeDownloads": "10000+",
            "appSize": "34603008",
            "urlVideo": "",
            "urlVideoHigh": "",
            "urlVideo30Sec": "https://cdn.test.com/banner/video/video-675832-30.mp4?rnd=1620699136",
            "urlVideo30SecHigh": "https://cdn.test.com/banner/video/video-675832-30_o.mp4?rnd=1620699131",
            "offerId": "5825774"
        },
    ]
}

filtered_data = []
for app in json_data["apps"]:
    app_data = {
        "id": app["id"],
        "title": app["title"],
        "country": app["country"],
        "revenueRate": app["revenueRate"],
        "urlApp": app["urlApp"],
    }
    filtered_data.append(app_data)

print(filtered_data)

# Output
d = [
    {
        'id': '675832210',
        'title': 'AGED',
        'country': ['CH'],
        'revenueRate': '0.10',
        'urlApp': 'https://admin.test.com/appLink.aspx?b=675832&e=1041813&tid=API_MP&sid=2c5cee038cd9449da35bc7b0f53cf60f&q='
    }
]

d = pd.DataFrame(filtered_data)
print(d)

# Output
          id title country revenueRate urlApp
0  675832210  AGED    [CH]        0.10 https://admin.test.com/appLink.aspx?b=675832&e=1041813&tid=API_MP&sid=2c5cee038cd9449da35bc7b0f53cf60f&q=

Upvotes: 1

Jonathan Leon
Jonathan Leon

Reputation: 5648

if your endgame is dataframe, just load the dataframe and take the columns you want:

setting the json to data

df = pd.json_normalize(data['apps'])

yields

          id title                          desc                                             urlImg  ... urlVideoHigh                                      urlVideo30Sec                                  urlVideo30SecHigh  offerId
0  675832210  AGED  No annoying ads&easy to play  https://test.com/pImg.aspx?b=675832&z=1041813&...  ...               https://cdn.test.com/banner/video/video-675832...  https://cdn.test.com/banner/video/video-675832...  5825774

[1 rows x 28 columns]

then if you want certain columns:

df_final = df[['title', 'desc', 'urlImg']]

  title                          desc                                             urlImg
0  AGED  No annoying ads&easy to play  https://test.com/pImg.aspx?b=675832&z=1041813&...

Upvotes: 0

Related Questions