Reputation: 1225
I have a dictionary where the keys are GitHub repository names and the values contain JSON-formatted data.
ex:
{'r1':[
{'id': 1178421030,
'name': 'x',
},
{'id': 1178420990,
'name': 'y',
}],
'r2':[
{'id': 1178421031,
'name': 'a',
},
{'id': 1178420950,
'name': 'b',
}]
}
I can create a dataframe from the JSON the values in the dict using:
df=pd.DataFrame()
for i in responses:
df=df.append(pd.json_normalize(responses[i]))
This gives me a df that looks like this:
id name
1178421030 x
1178420990 y
1178421031 a
1178420950 b
I want the keys of the dict as another column named repo_name
in the df, something like:
id name repo_name
1178421030 x r1
1178420990 y r1
1178421031 a r2
1178420950 b r2
how shall I go about doing this ?
Upvotes: 0
Views: 52
Reputation: 28669
I would suggest using collections.defaultdict; it should allow you more control over your data collection :
from collections import defaultdict
d = defaultdict(list)
for key, value in data.items():
for entry in value:
d["id"].append(entry["id"])
d["name"].append(entry["name"])
d["repo_name"].append(key)
d
defaultdict(list,
{'id': [1178421030, 1178420990, 1178421031, 1178420950],
'name': ['x', 'y', 'a', 'b'],
'repo_name': ['r1', 'r1', 'r2', 'r2']})
Create dataframe:
pd.DataFrame(d)
id name repo_name
0 1178421030 x r1
1 1178420990 y r1
2 1178421031 a r2
3 1178420950 b r2
Another option would be to use json_normalize in a list comprehension:
pd.concat(pd.json_normalize(data, record_path=[key]).assign(repo_name=key)
for key in data)
Upvotes: 1
Reputation: 1721
let's say your JSON is called "d"
data=pd.DataFrame()
for i in d.keys():
z=pd.DataFrame(d[i])
z['repo_name']=i
data=pd.concat([data,z])
id name repo_name
0 1178421030 x r1
1 1178420990 y r1
0 1178421031 a r2
1 1178420950 b r2
Upvotes: 2