Reputation: 505
I'm trying to import JSON data to Dataframe via json_normalize but cannot get it to work.
My data:
a key is same as c1 key
[
{
"a": "A1",
"b": "B1",
"c": [
{
"c1": "C111",
"c2": "C121",
"c3": ["C1131","C1132"]
}
]
},
{
"a": "A2",
"b": "B2",
"c": [
{
"c1": "C211",
"c2": "C212",
"c3": ["C2131","C2132"]
},
{
"c1": "C221",
"c2": "C222",
"c3": ["C2231"]
}
]
}
]
I want to make a DataFrame like
a c1(a) c2 c3
0 A1 C111 C121 ["C1131","C1132"]
1 A2 C211 C212 ["C2131","C2132"]
2 A2 C221 C222 ["C2231"]
When I use json_normalize it shows ValueError
:
entity_df = json_normalize(data, 'c', 'a')
ValueError: Conflicting metadata name a, need distinguishing prefix
How should I change the json_normalize parameters? Any help will be appreciated.
Upvotes: 3
Views: 685
Reputation: 93151
If you have already gone through the pain of obscuring your real data, make the mock data having the same features as the real one as well.
Assuming you have this JSON:
json_data =\
[{'a': 'A1',
'b': 'B1',
'c': [{'a': 'C111', 'c2': 'C121', 'c3': ['C1131', 'C1132']}]},
{'a': 'A2',
'b': 'B2',
'c': [{'a': 'C211', 'c2': 'C212', 'c3': ['C2131', 'C2132']},
{'a': 'C221', 'c2': 'C222', 'c3': ['C2231']}]}]
You only need one line of code to extract it:
pd.json_normalize(json_data, 'c', ['a', 'b'], record_prefix='data.')
Result:
data.a data.c2 data.c3 a b
0 C111 C121 [C1131, C1132] A1 B1
1 C211 C212 [C2131, C2132] A2 B2
2 C221 C222 [C2231] A2 B2
The record_prefix='data.'
is what the ValueError: Conflicting metadata name a, need distinguishing prefix
error message meant.
Upvotes: 1
Reputation: 17322
you can try:
from collections import defaultdict
norm_data = defaultdict(list)
for item in data:
for element in item['c']:
norm_data['a'].append(item['a'])
for k, v in element.items():
if k in {'a', 'c1'}:
norm_data['c1(a)'].append(v)
else:
norm_data[k].append(v)
pd.DataFrame(norm_data)
Upvotes: 1
Reputation: 3604
data = [
{
"a": "A1",
"b": "B1",
"c": [
{
"c1": "C111",
"c2": "C121",
"c3": ["C1131","C1132"]
}
]
},
{
"a": "A2",
"b": "B2",
"c": [
{
"c1": "C211",
"c2": "C212",
"c3": ["C2131","C2132"]
},
{
"c1": "C221",
"c2": "C222",
"c3": ["C2231"]
}
]
}
]
pd.io.json.json_normalize(data,"c", ['a', 'b',])
Output:
c1 c2 c3 a b
0 C111 C121 [C1131, C1132] A1 B1
1 C211 C212 [C2131, C2132] A2 B2
2 C221 C222 [C2231] A2 B2
Upvotes: 0