Filling in blank dictionary values based on other key value pairs

Question

I have a df that contains a column ['mjtheme_namecode'] which is in dictionary form containing a code and a name. The codes all have numbers but some of the names are missing. I would like to fill in the missing name values based on other pairs with the same code. Here is the df column in question:

import pandas as pd
import json
import numpy as np
from pandas.io.json import json_normalize
df = pd.read_json('data/world_bank_projects.json')
print(df['mjtheme_namecode'].head(15))

0     [{'code': '8', 'name': 'Human development'}, {...
1     [{'code': '1', 'name': 'Economic management'},...
2     [{'code': '5', 'name': 'Trade and integration'...
3     [{'code': '7', 'name': 'Social dev/gender/incl...
4     [{'code': '5', 'name': 'Trade and integration'...
5     [{'code': '6', 'name': 'Social protection and ...
6     [{'code': '2', 'name': 'Public sector governan...
7     [{'code': '11', 'name': 'Environment and natur...
8     [{'code': '10', 'name': 'Rural development'}, ...
9     [{'code': '2', 'name': 'Public sector governan...
10    [{'code': '10', 'name': 'Rural development'}, ...
11    [{'code': '10', 'name': 'Rural development'}, ...
12                          [{'code': '4', 'name': ''}]
13    [{'code': '5', 'name': 'Trade and integration'...
14    [{'code': '6', 'name': 'Social protection and ...
Name: mjtheme_namecode, dtype: object

I know I could make the column a separate df and then ffill, but I think I would have to reindex, so I don't think I could put it back in place after that. I'm thinking ideally I'd make a list (with no duplicates) of only dict items with both codes and names then use that list to iterate over the dictionary in a for loop where name becomes the matching value from the non-duplicate list I created. Does this make sense? Not sure how to go about it.

user3483203 · Accepted Answer

You can take a similar approach of creating a new DataFrame, but then transition back:

theme= pd.DataFrame([val for pair in df['mjtheme_namecode'].values for val in pair])
mapper = theme.drop_duplicates().replace(r'', np.nan).dropna().set_index('code').name.to_dict()

Using a list comprehension to put it all together:

s = pd.Series(
    [[{'code': i['code'], 'name': mapper[i['code']]}
        for i in t] for t in df.mjtheme_namecode]
)

s.head(13)

0     [{'code': '8', 'name': 'Human development'}, {...
1     [{'code': '1', 'name': 'Economic management'},...
2     [{'code': '5', 'name': 'Trade and integration'...
3     [{'code': '7', 'name': 'Social dev/gender/incl...
4     [{'code': '5', 'name': 'Trade and integration'...
5     [{'code': '6', 'name': 'Social protection and ...
6     [{'code': '2', 'name': 'Public sector governan...
7     [{'code': '11', 'name': 'Environment and natur...
8     [{'code': '10', 'name': 'Rural development'}, ...
9     [{'code': '2', 'name': 'Public sector governan...
10    [{'code': '10', 'name': 'Rural development'}, ...
11    [{'code': '10', 'name': 'Rural development'}, ...
12    [{'code': '4', 'name': 'Financial and private ...
dtype: object

As you can see, the last row (row 12) has been correctly filled in, as have the others, and you can reassign this to your original DataFrame.

Filling in blank dictionary values based on other key value pairs

Answers (1)

Related Questions