Simon
Simon

Reputation: 33

Efficiently converting combination of pandas dataframe column into key-value pair

I have a dataframe and wish to convert it to key value pair: My dataframe:

Institution    Funding
ETH             WTO
Harbin        WTO, CASA, WHO
SJSU          NASA, WTO
JAIST           NASA

If I had single entry in the "Funding" Column. I was able to achieve it with built-ins like this:

dict(zip(df['Funding'], df['Institution']))

But, with more complicated case (with multiple entries and comma's and white space). This doesn't work. What can be done? (also ideally I would like to use all the distinct entries in Funding column. For details: see the dataframe above and expected output below)

Expected output:

{'ETH': {'WTO':True, 'CASA': False, 'WHO': False, 'NASA': False},
'Harbin': {'WTO':True, 'CASA': True, 'WHO': True, 'NASA': False},
'SJSU': {'WTO':True, 'CASA': False, 'WHO': False, 'NASA': True},
'JAIST':{'WTO':False, 'CASA': False, 'WHO': False, 'NASA': True}
}

Upvotes: 3

Views: 39

Answers (1)

jezrael
jezrael

Reputation: 863291

Use DataFrame.set_index for index by Institution, select Funding, create dummies columns by Series.str.get_dummies, convert to boolean and then to nested dictionary by DataFrame.to_dict:

d = (df.set_index('Institution')['Funding']
       .str.get_dummies(', ')
       .astype(bool)
       .to_dict('index'))
print (d)
{'ETH': {'CASA': False, 'NASA': False, 'WHO': False, 'WTO': True},
 'Harbin': {'CASA': True, 'NASA': False, 'WHO': True, 'WTO': True}, 
 'SJSU': {'CASA': False, 'NASA': True, 'WHO': False, 'WTO': True}, 
 'JAIST': {'CASA': False, 'NASA': True, 'WHO': False, 'WTO': False}}

Upvotes: 3

Related Questions