Reputation: 33
I have a dataframe and wish to convert it to key value pair: My dataframe:
Institution Funding
ETH WTO
Harbin WTO, CASA, WHO
SJSU NASA, WTO
JAIST NASA
If I had single entry in the "Funding" Column. I was able to achieve it with built-ins like this:
dict(zip(df['Funding'], df['Institution']))
But, with more complicated case (with multiple entries and comma's and white space). This doesn't work. What can be done? (also ideally I would like to use all the distinct entries in Funding column. For details: see the dataframe above and expected output below)
Expected output:
{'ETH': {'WTO':True, 'CASA': False, 'WHO': False, 'NASA': False},
'Harbin': {'WTO':True, 'CASA': True, 'WHO': True, 'NASA': False},
'SJSU': {'WTO':True, 'CASA': False, 'WHO': False, 'NASA': True},
'JAIST':{'WTO':False, 'CASA': False, 'WHO': False, 'NASA': True}
}
Upvotes: 3
Views: 39
Reputation: 863291
Use DataFrame.set_index
for index by Institution
, select Funding
, create dummies columns by Series.str.get_dummies
, convert to boolean and then to nested dictionary by DataFrame.to_dict
:
d = (df.set_index('Institution')['Funding']
.str.get_dummies(', ')
.astype(bool)
.to_dict('index'))
print (d)
{'ETH': {'CASA': False, 'NASA': False, 'WHO': False, 'WTO': True},
'Harbin': {'CASA': True, 'NASA': False, 'WHO': True, 'WTO': True},
'SJSU': {'CASA': False, 'NASA': True, 'WHO': False, 'WTO': True},
'JAIST': {'CASA': False, 'NASA': True, 'WHO': False, 'WTO': False}}
Upvotes: 3