Devarshi Goswami
Devarshi Goswami

Reputation: 1225

python dict append values to dataframe after adding a row for dict keys

I have a dictionary where the keys are GitHub repository names and the values contain JSON-formatted data.

ex:


    {'r1':[
       {'id': 1178421030,
       'name': 'x',
        },
       {'id': 1178420990,
       'name': 'y',
       }],
    'r2':[
       {'id': 1178421031,
       'name': 'a',
        },
       {'id': 1178420950,
       'name': 'b',
       }]
    }

I can create a dataframe from the JSON the values in the dict using:

df=pd.DataFrame()
for i in responses:
    
    df=df.append(pd.json_normalize(responses[i]))

This gives me a df that looks like this:

   id              name
 1178421030           x
 1178420990           y 
 1178421031           a
 1178420950           b

I want the keys of the dict as another column named repo_name in the df, something like:

   id              name       repo_name
 1178421030           x          r1
 1178420990           y          r1
 1178421031           a          r2   
 1178420950           b          r2

how shall I go about doing this ?

Upvotes: 0

Views: 52

Answers (2)

sammywemmy
sammywemmy

Reputation: 28669

I would suggest using collections.defaultdict; it should allow you more control over your data collection :

from collections import defaultdict

d = defaultdict(list)
for key, value in data.items():
    for entry in value:
        d["id"].append(entry["id"])
        d["name"].append(entry["name"])
        d["repo_name"].append(key)

d

defaultdict(list,
            {'id': [1178421030, 1178420990, 1178421031, 1178420950],
             'name': ['x', 'y', 'a', 'b'],
             'repo_name': ['r1', 'r1', 'r2', 'r2']})

Create dataframe:

pd.DataFrame(d)

      id      name  repo_name
0   1178421030  x   r1
1   1178420990  y   r1
2   1178421031  a   r2
3   1178420950  b   r2

Another option would be to use json_normalize in a list comprehension:

pd.concat(pd.json_normalize(data, record_path=[key]).assign(repo_name=key) 
          for key in data)

Upvotes: 1

Billy Bonaros
Billy Bonaros

Reputation: 1721

let's say your JSON is called "d"

   data=pd.DataFrame()
   for i in d.keys():
        z=pd.DataFrame(d[i])
        z['repo_name']=i
        data=pd.concat([data,z])



           id name repo_name
0  1178421030    x        r1
1  1178420990    y        r1
0  1178421031    a        r2
1  1178420950    b        r2

Upvotes: 2

Related Questions