Sheron
Sheron

Reputation: 615

Python filling in missing values based on existing data

I have a dataframe containing a one missing value.

   exam_id   exam  
0        1   french   
1        2   italian 
2        3   chinese  
3        4   english  
4        3   chinese  
5        5   russian  
6        1   french       
7      NaN   russian   
8        1   french   
9        2   italian

I want to fill in the missing exam_id for russian exam based on existing information. Since exam_id for russian is 5 I would like to have the same value assigned to the missing one.

Upvotes: 1

Views: 207

Answers (2)

3novak
3novak

Reputation: 2544

This approach does not only fill missing values. So beware. However, this would also take care of miscodings (e.g., "french" being coded as 3). Building a dictionary for the languages and their values and then applying it via a map will create a new exam_id column. Do note, however, that if a language doesn't appear in the dictionary (e.g. "French"), it will produce a missing value.

language_test_map = {'french': 1,
                     'italian': 2,
                     'chinese': 3,
                     'english': 4,
                     'russian': 5}

df['exam_id'] = df['exam'].map(language_test_map)

Upvotes: 1

akuiper
akuiper

Reputation: 214927

You can group your data frame by exam, then do a ffill + bfill in case there are missing values before and after the existing value:

df.groupby("exam").ffill().bfill()

enter image description here

Upvotes: 3

Related Questions