Rafael Lima
Rafael Lima

Reputation: 420

Extract values from dicts inside lists

I'm trying to extract the values from this JSON file, but I having some trouble to extract the data inside from lists in the dict values. For example, in the city and state, I would like to get only the name values and create a Pandas Dataframe and select only some keys like this.

enter image description here

I tried using some for with get methods techniques, but without success.

{'birthday': ['1987-07-13T00:00:00.000Z'],
 'cpf': ['9999999999999'],
 'rg': [],
 'gender': ['Feminino'],
 'email': ['[email protected]'],
 'phone_numbers': ['51999999999'],
 'photo': [],
 'id': 11111111,
 'duplicate_id': -1,
 'name': 'My User',
 'cnpj': [],
 'company_name': '[]',
 'city': [{'id': 0001, 'name': 'Porto Alegre'}],
 'state': [{'id': 100, 'name': 'Rio Grande do Sul', 'fs': 'RS'}],
 'type': 'Private Person',
 'tags': [],
 'pending_tickets_count': 0}

Upvotes: 0

Views: 46

Answers (1)

Asish M.
Asish M.

Reputation: 2647

In [123]: data
Out[123]:
{'birthday': ['1987-07-13T00:00:00.000Z'],
 'cpf': ['9999999999999'],
 'rg': [],
 'gender': ['Feminino'],
 'email': ['[email protected]'],
 'phone_numbers': ['51999999999'],
 'photo': [],
 'id': 11111111,
 'duplicate_id': -1,
 'name': 'My User',
 'cnpj': [],
 'company_name': '[]',
 'city': [{'id': '0001', 'name': 'Porto Alegre'}],
 'state': [{'id': 100, 'name': 'Rio Grande do Sul', 'fs': 'RS'}],
 'type': 'Private Person',
 'tags': [],
 'pending_tickets_count': 0}

In [124]: data2 = {k:v for k,v in data.items() if k in required}

In [125]: data2
Out[125]:
{'birthday': ['1987-07-13T00:00:00.000Z'],
 'gender': ['Feminino'],
 'id': 11111111,
 'name': 'My User',
 'city': [{'id': '0001', 'name': 'Porto Alegre'}],
 'state': [{'id': 100, 'name': 'Rio Grande do Sul', 'fs': 'RS'}]}

In [126]: pd.DataFrame(data2).assign(
     ...:     city_name=lambda x: x['city'].str.get('name'),
     ...:     state_name=lambda x: x['state'].str.get('name'),
     ...:     state_fs=lambda x: x['state'].str.get('fs')
     ...:     ).drop(['state', 'city'], axis=1)
Out[126]:
                   birthday    gender        id     name     city_name         state_name state_fs
0  1987-07-13T00:00:00.000Z  Feminino  11111111  My User  Porto Alegre  Rio Grande do Sul       RS

reason why data2 is required is that you can't have columns that differ in length. So in this case, pd.DataFrame(data) won't work as rg has 0 items but birthday has 1 item.

Also something to look at if you are directly dealing with json files is pd.json_normalize

Upvotes: 1

Related Questions