Stanleyrr
Stanleyrr

Reputation: 875

Fill multiple columns with value from another dataframe

I am trying to fill empty columns in one Pandas dataframe with columns from another Pandas dataframe, based on certain conditions.

First table is audit_records_df and it looks like this:

id audit_type audit_date maliciousness_level suspicious_counts
123456 Unknown 2/5/21 NaN NaN
123456 Cleared 2/6/21 NaN NaN
123456 Terminated 2/8/21 NaN NaN
345678 Terminated 2/5/21 NaN NaN

Second table is spam_profile_most_recent_notes:

id audit_type audit_date maliciousness_level suspicious_counts ire_1 ire_2
123456 Unknown 2/5/21 high 3 222 222
345678 Terminated 2/5/21 high 6 222 222

Note that unlike audit_records_df table (where an id could have multiple rows), in spam_profile_most_recent_notes table each id will only have 1 row.

I am trying to fill columns maliciousness_level and suspicious_counts in audit_records_df table with values from columns of the same name from spam_profile_most_recent_notes table (we need to ignore columns ire_1 and ire_2), based on the following criteria:

After the filling, the audit_records_df table should look like this:

id audit_type audit_date maliciousness_level suspicious_counts
123456 Unknown 2/5/21 high 3
123456 Cleared 2/6/21 high 3
123456 Terminated 2/8/21 high 3
345678 Terminated 2/5/21 high 6

I've seen some slightly similar questions like this one: Conditionally fill column with value from another DataFrame based on row match in Pandas. However, all I've seen were regarding filling in value in 1 column, unlike my use case where I have to fill in values for multiple columns.

Any advice would be greatly appreciated.

Upvotes: 1

Views: 1223

Answers (1)

Safuan Mazlan
Safuan Mazlan

Reputation: 46

I think you can solve this by:

  1. Drop the malicious_level and suspicious_counts columns from the audit_record_df table.
  2. Keep only id, malicious_level and suspicious_counts columns from the spam_profile_most_recent_notes
  3. Perform a left inner join on both tables using id
audit_record_df.drop(['maliciousness_level', 'suspicious_counts'], axis=1, inplace=True)
print(audit_record_df)

       id  audit_type audit_date
0  123456     Unknown     2/5/21
1  123456     Cleared     2/6/21
2  123456  Terminated     2/8/21
3  345678  Terminated     2/5/21

spam_profile_most_recent_notes = spam_profile_most_recent_notes[['id', 'maliciousness_level', 'suspicious_counts']]
print(spam_profile_most_recent_notes)

       id  audit_type audit_date
0  123456     Unknown     2/5/21
1  123456     Cleared     2/6/21
2  123456  Terminated     2/8/21
3  345678  Terminated     2/5/21

new_df = audit_record_df.merge(spam_profile_most_recent_notes, on='id', how='left')
print(new_df)

       id  audit_type audit_date maliciousness_level  suspicious_counts
0  123456     Unknown     2/5/21                high                  3
1  123456     Cleared     2/6/21                high                  3
2  123456  Terminated     2/8/21                high                  3
3  345678  Terminated     2/5/21                high                  6   

Upvotes: 3

Related Questions