Javier Lopez Tomas
Javier Lopez Tomas

Reputation: 2342

Extract JSON with different values to repeated id column in pandas

I have the following dataframe:

df = pd.DataFrame({'id':['0001', '0001'],
                   'vat_countries': [{'vat': 21, 'country': 'ES'}, 
                                     {'vat': 23, 'country': 'GR'}]
                   })

id        vat_countries
0001     {'vat': 21, 'country': 'ES'}
0001     {'vat': 23, 'country': 'GR'}

What I want to get is:

id   vat  country
0001  21    'ES'
0001  23    'GR'

Reading other SO questions I got the following code:

df = df.drop('vat_countries', 1).assign(**pd.DataFrame(list_df['vat_countries'].values.tolist()))

However, that gives me:

id   vat  country
    0001  21    'ES'
    0001  21    'ES'

which is wrong.

I have been able to get the results that I want using:

c = pd.concat([pd.DataFrame(df[column].values.tolist()), 
               df.drop(column, 1).reset_index()], 
              axis=1, ignore_index=True)

But this requires to manually type the column names. Otherwise, the column names are 0, 1, 2, 3...

Is there any way I get my desired output while conserving the name of the columns? Thanks

EDIT: Trying BEN_YO solution. I have this enter image description here And after the code I get this enter image description here Everything gets duplicated twice

Upvotes: 1

Views: 168

Answers (3)

sammywemmy
sammywemmy

Reputation: 28644

You could access the individual values using string methods :

df["vat"] = df.vat_countries.str["vat"]
df["country"] = df.vat_countries.str["country"]
df      

    id         vat_countries               vat  country
0   0001    {'vat': 21, 'country': 'ES'}    21  ES
1   0001    {'vat': 23, 'country': 'GR'}    23  GR

Upvotes: 1

BENY
BENY

Reputation: 323226

Try with pop fix your code

df.join(pd.DataFrame(df.pop('vat_countries').tolist(),index=df.index))
Out[300]: 
     id  vat country
0  0001   21      ES
1  0001   23      GR

Upvotes: 6

Daweo
Daweo

Reputation: 36390

I would apply pd.Series at columns with dicts and join result with original, i.e.:

import pandas as pd
df = pd.DataFrame({'id':['0001', '0001'], 'vat_countries': [{'vat': 21, 'country': 'ES'}, {'vat': 23, 'country': 'GR'}]})
final_df = df.join(df.vat_countries.apply(pd.Series))
print(final_df)

Output:

     id                 vat_countries  vat country
0  0001  {'vat': 21, 'country': 'ES'}   21      ES
1  0001  {'vat': 23, 'country': 'GR'}   23      GR

As you can see vat_countires was left, if you wish to jettison it you might simply drop it.

Upvotes: 0

Related Questions