moussesj94
moussesj94

Reputation: 505

How to flatten nest Json data with json_normalize

I'm trying to import JSON data to Dataframe via json_normalize but cannot get it to work.

My data:

a key is same as c1 key

[
  {
    "a": "A1",
    "b": "B1",
    "c": [
      {
        "c1": "C111",
        "c2": "C121",
        "c3": ["C1131","C1132"]
      }
    ]
  },
  {
    "a": "A2",
    "b": "B2",
    "c": [
      {
        "c1": "C211",
        "c2": "C212",
        "c3": ["C2131","C2132"]
      },
      {
        "c1": "C221",
        "c2": "C222",
        "c3": ["C2231"]
      }
    ]
  }
]

I want to make a DataFrame like

           a     c1(a)      c2                    c3
0         A1      C111    C121     ["C1131","C1132"]
1         A2      C211    C212     ["C2131","C2132"]
2         A2      C221    C222             ["C2231"]

When I use json_normalize it shows ValueError:

entity_df = json_normalize(data, 'c', 'a')

ValueError: Conflicting metadata name a, need distinguishing prefix 

How should I change the json_normalize parameters? Any help will be appreciated.

Upvotes: 3

Views: 685

Answers (3)

Code Different
Code Different

Reputation: 93151

If you have already gone through the pain of obscuring your real data, make the mock data having the same features as the real one as well.

Assuming you have this JSON:

json_data =\
[{'a': 'A1',
  'b': 'B1',
  'c': [{'a': 'C111', 'c2': 'C121', 'c3': ['C1131', 'C1132']}]},
 {'a': 'A2',
  'b': 'B2',
  'c': [{'a': 'C211', 'c2': 'C212', 'c3': ['C2131', 'C2132']},
   {'a': 'C221', 'c2': 'C222', 'c3': ['C2231']}]}]

You only need one line of code to extract it:

pd.json_normalize(json_data, 'c', ['a', 'b'], record_prefix='data.')

Result:

  data.a data.c2         data.c3   a   b
0   C111    C121  [C1131, C1132]  A1  B1
1   C211    C212  [C2131, C2132]  A2  B2
2   C221    C222         [C2231]  A2  B2

The record_prefix='data.' is what the ValueError: Conflicting metadata name a, need distinguishing prefix error message meant.

Upvotes: 1

kederrac
kederrac

Reputation: 17322

you can try:

from collections import defaultdict

norm_data = defaultdict(list)
for item in data:
    for element in item['c']:
        norm_data['a'].append(item['a'])
        for k, v in element.items():
            if k in {'a', 'c1'}:
                norm_data['c1(a)'].append(v)
            else:
                norm_data[k].append(v)

pd.DataFrame(norm_data)

enter image description here

Upvotes: 1

adnanmuttaleb
adnanmuttaleb

Reputation: 3604

data = [
  {
    "a": "A1",
    "b": "B1",
    "c": [
      {
        "c1": "C111",
        "c2": "C121",
        "c3": ["C1131","C1132"]
      }
    ]
  },
  {
    "a": "A2",
    "b": "B2",
    "c": [
      {
        "c1": "C211",
        "c2": "C212",
        "c3": ["C2131","C2132"]
      },
      {
        "c1": "C221",
        "c2": "C222",
        "c3": ["C2231"]
      }
    ]
  }
]
pd.io.json.json_normalize(data,"c", ['a', 'b',])

Output:

    c1      c2      c3              a   b
0   C111    C121    [C1131, C1132]  A1  B1
1   C211    C212    [C2131, C2132]  A2  B2
2   C221    C222    [C2231]         A2  B2

Upvotes: 0

Related Questions