Modify pandas data frame values based on condition

Question

Given a data frame with 9,000+ records (rows), a column with unique IDs, and each ID can have multiple records as shown below.

df.head(4)

Unique_ID	Record_1	Record_2
AN5001	90.0	ten
AN5002	90.0	five
AN5001	95.0	five
AN5003	60.0	forty

There are 360 unique IDs. However, about half of them need to be corrected. Consider below df_corrected_ID.head(3)

Unique_ID_old	Unique_ID_new
AN5001	AN5010
AN5002	AN5002
AN5003	AN5011

How would you, most efficiently, fix the Unique ID in the main df with 9,000+ records using the df_corrected_ID data frame?

So, check if Unique_ID_old is present in df['Unique_ID'] column and if it is, replace it with Unique_ID_new from df_corrected_ID.

How would one then check change occurred correctly (for example, just show the difference between the Unique_IDs -- say after converting the original and updated columns to lists and then, list(set(Unique_ID) - set(Unique_ID_new)).

It's okay to add another new column to original df if needed with corrected IDs, as long as the order is maintained an none of the records are changed.

Thanks!

jprebys · Accepted Answer

You can use Series.replace for this. Make your corrected ID DataFrame into a dictionary that maps old IDs to new ones, and you can use it to update your existing column or create a new one.

update_dict = dict(zip(df_corrected_ID['Unique_ID_old'], df_corrected_ID['Unique_ID_new'])

# to create a new column
df['Unique_ID_updated'] = df['Unique_ID'].replace(update_dict)

# to update the existing column
df['Unique_ID'] = df['Unique_ID'].replace(update_dict)

Modify pandas data frame values based on condition

Answers (2)

Related Questions