Fill multiple columns with value from another dataframe

Question

I am trying to fill empty columns in one Pandas dataframe with columns from another Pandas dataframe, based on certain conditions.

First table is audit_records_df and it looks like this:

id	audit_type	audit_date	maliciousness_level	suspicious_counts
123456	Unknown	2/5/21	NaN	NaN
123456	Cleared	2/6/21	NaN	NaN
123456	Terminated	2/8/21	NaN	NaN
345678	Terminated	2/5/21	NaN	NaN

Second table is spam_profile_most_recent_notes:

id	audit_type	audit_date	maliciousness_level	suspicious_counts	ire_1	ire_2
123456	Unknown	2/5/21	high	3	222	222
345678	Terminated	2/5/21	high	6	222	222

Note that unlike audit_records_df table (where an id could have multiple rows), in spam_profile_most_recent_notes table each id will only have 1 row.

I am trying to fill columns maliciousness_level and suspicious_counts in audit_records_df table with values from columns of the same name from spam_profile_most_recent_notes table (we need to ignore columns ire_1 and ire_2), based on the following criteria:

For rows in audit_records_df table where id matches id in spam_profile_most_recent_notes, fill the maliciousness_level and suspicious_counts in audit_records_df table with corresponding values from spam_profile_most_recent_notes table in the rows where id matches.

After the filling, the audit_records_df table should look like this:

id	audit_type	audit_date	maliciousness_level	suspicious_counts
123456	Unknown	2/5/21	high	3
123456	Cleared	2/6/21	high	3
123456	Terminated	2/8/21	high	3
345678	Terminated	2/5/21	high	6

I've seen some slightly similar questions like this one: Conditionally fill column with value from another DataFrame based on row match in Pandas. However, all I've seen were regarding filling in value in 1 column, unlike my use case where I have to fill in values for multiple columns.

Any advice would be greatly appreciated.

Safuan Mazlan · Accepted Answer

I think you can solve this by:

Drop the malicious_level and suspicious_counts columns from the audit_record_df table.
Keep only id, malicious_level and suspicious_counts columns from the spam_profile_most_recent_notes
Perform a left inner join on both tables using id

audit_record_df.drop(['maliciousness_level', 'suspicious_counts'], axis=1, inplace=True)
print(audit_record_df)

       id  audit_type audit_date
0  123456     Unknown     2/5/21
1  123456     Cleared     2/6/21
2  123456  Terminated     2/8/21
3  345678  Terminated     2/5/21

spam_profile_most_recent_notes = spam_profile_most_recent_notes[['id', 'maliciousness_level', 'suspicious_counts']]
print(spam_profile_most_recent_notes)

       id  audit_type audit_date
0  123456     Unknown     2/5/21
1  123456     Cleared     2/6/21
2  123456  Terminated     2/8/21
3  345678  Terminated     2/5/21

new_df = audit_record_df.merge(spam_profile_most_recent_notes, on='id', how='left')
print(new_df)

       id  audit_type audit_date maliciousness_level  suspicious_counts
0  123456     Unknown     2/5/21                high                  3
1  123456     Cleared     2/6/21                high                  3
2  123456  Terminated     2/8/21                high                  3
3  345678  Terminated     2/5/21                high                  6

Fill multiple columns with value from another dataframe

Answers (1)

Related Questions