Reputation: 117
I have a df that contains a column ['mjtheme_namecode'] which is in dictionary form containing a code and a name. The codes all have numbers but some of the names are missing. I would like to fill in the missing name values based on other pairs with the same code. Here is the df column in question:
import pandas as pd
import json
import numpy as np
from pandas.io.json import json_normalize
df = pd.read_json('data/world_bank_projects.json')
print(df['mjtheme_namecode'].head(15))
0 [{'code': '8', 'name': 'Human development'}, {...
1 [{'code': '1', 'name': 'Economic management'},...
2 [{'code': '5', 'name': 'Trade and integration'...
3 [{'code': '7', 'name': 'Social dev/gender/incl...
4 [{'code': '5', 'name': 'Trade and integration'...
5 [{'code': '6', 'name': 'Social protection and ...
6 [{'code': '2', 'name': 'Public sector governan...
7 [{'code': '11', 'name': 'Environment and natur...
8 [{'code': '10', 'name': 'Rural development'}, ...
9 [{'code': '2', 'name': 'Public sector governan...
10 [{'code': '10', 'name': 'Rural development'}, ...
11 [{'code': '10', 'name': 'Rural development'}, ...
12 [{'code': '4', 'name': ''}]
13 [{'code': '5', 'name': 'Trade and integration'...
14 [{'code': '6', 'name': 'Social protection and ...
Name: mjtheme_namecode, dtype: object
I know I could make the column a separate df and then ffill, but I think I would have to reindex, so I don't think I could put it back in place after that. I'm thinking ideally I'd make a list (with no duplicates) of only dict items with both codes and names then use that list to iterate over the dictionary in a for loop where name becomes the matching value from the non-duplicate list I created. Does this make sense? Not sure how to go about it.
Upvotes: 1
Views: 423
Reputation: 51165
You can take a similar approach of creating a new DataFrame, but then transition back:
theme= pd.DataFrame([val for pair in df['mjtheme_namecode'].values for val in pair])
mapper = theme.drop_duplicates().replace(r'', np.nan).dropna().set_index('code').name.to_dict()
Using a list comprehension to put it all together:
s = pd.Series(
[[{'code': i['code'], 'name': mapper[i['code']]}
for i in t] for t in df.mjtheme_namecode]
)
s.head(13)
0 [{'code': '8', 'name': 'Human development'}, {...
1 [{'code': '1', 'name': 'Economic management'},...
2 [{'code': '5', 'name': 'Trade and integration'...
3 [{'code': '7', 'name': 'Social dev/gender/incl...
4 [{'code': '5', 'name': 'Trade and integration'...
5 [{'code': '6', 'name': 'Social protection and ...
6 [{'code': '2', 'name': 'Public sector governan...
7 [{'code': '11', 'name': 'Environment and natur...
8 [{'code': '10', 'name': 'Rural development'}, ...
9 [{'code': '2', 'name': 'Public sector governan...
10 [{'code': '10', 'name': 'Rural development'}, ...
11 [{'code': '10', 'name': 'Rural development'}, ...
12 [{'code': '4', 'name': 'Financial and private ...
dtype: object
As you can see, the last row (row 12) has been correctly filled in, as have the others, and you can reassign this to your original DataFrame.
Upvotes: 1