ChiChi
ChiChi

Reputation: 103

Mapping keys from dict to list of dicts in dataframe column

I have a dataframe(df) that looks like this:

         a   b                                              c
1    35-23  RB   {'1': '1', '13': '15', '14': '30', '20': '3'}
2    24-10  RB              {'1': '1', '13': '17', '14': '86'}
3    26-30  RB              {'1': '1', '13': '14', '14': '54'}  
4     14-7  RB                                              []
5    24-27  RB   {'1': '1', '13': '20', '14': '94', '15': '1'}
6    38-52  RB     {'1': '1', '70': '3', '72': '1', '84': '1'}
7    False  RB   {'1': '1', '70': '4', '71': '1', '72': '0.5'}
8    15-20  RB                                              []
9    30-17  RB                                              []

And I have another dictionary that gives the human readable names (approx. 100 key, value pairs) to the dictionary keys in column c. It looks something like this:

dict_names = {1: 'Test', 13: 'Ind Score', 14: 'Successful', 15: 'Cofactor', 20: 'Attempts', etc ...}

What I would like to do is map my dict_names values to the keys in my list of dicts in column c of my df, so I have human readable names. I would then convert the keys for the list of dicts (column c), the human readable names, to column names in my df.

I know I can convert the dicts in column c to different columns by doing something like this:

df['c'] = df['c'].apply(lambda x : dict(eval(x)))
df_stats = df['c'].apply(pd.Series)

Then I can concat the df_stats with the df and drop the old column c. Voila!

But I am stuck with mapping the dict_names values with the keys of the list of dicts in column c of my df.

The final df output I am looking for would be something like this:

         a   b  Test    Ind Score   Successful      Cofactor       Attempts ....
1    35-23  RB      1          15           30           NaN              3
2    24-10  RB      1          17           86           NaN            NaN  
3    26-30  RB      1          17           86           NaN            NaN              
4     14-7  RB    NaN         NaN          NaN           NaN            NaN
5    24-27  RB      1          20           94             1            NaN
6    38-52  RB      1          NaN         NaN           NaN            NaN
7    False  RB      1          NaN         NaN           NaN            NaN
8    15-20  RB    NaN          NaN         NaN           NaN            NaN
9    30-17  RB    NaN          NaN         NaN           NaN            NaN 

Help finding my way to my final output would be appreciated.

Upvotes: 0

Views: 152

Answers (1)

BENY
BENY

Reputation: 323226

You can using rename and pass a dict to replace the columns; name

from ast import literal_eval 
df['c'] = df['c'].apply(literal_eval) 
#dd={'1': 'Test', '13': 'Ind Score', '14': 'Successful', '15': 'Cofactor', '20': 'Attempts'}
dd= {str(k):v for k,v in dict_names.items()}#change
pd.concat([df.drop('c',1),df['c'].apply(pd.Series).rename(columns=dd)],1)

Out[1320]: 
       a   b Test Ind Score Successful Attempts
0  35-23  BR    1        15         30        3
1  24-10  BR    1        17         86      NaN
2  26-30  BR    1        14         54      NaN

Upvotes: 1

Related Questions