Reputation: 875
I am trying to fill empty columns in one Pandas dataframe with columns from another Pandas dataframe, based on certain conditions.
First table is audit_records_df
and it looks like this:
id | audit_type | audit_date | maliciousness_level | suspicious_counts |
---|---|---|---|---|
123456 | Unknown | 2/5/21 | NaN | NaN |
123456 | Cleared | 2/6/21 | NaN | NaN |
123456 | Terminated | 2/8/21 | NaN | NaN |
345678 | Terminated | 2/5/21 | NaN | NaN |
Second table is spam_profile_most_recent_notes
:
id | audit_type | audit_date | maliciousness_level | suspicious_counts | ire_1 | ire_2 |
---|---|---|---|---|---|---|
123456 | Unknown | 2/5/21 | high | 3 | 222 | 222 |
345678 | Terminated | 2/5/21 | high | 6 | 222 | 222 |
Note that unlike audit_records_df
table (where an id
could have multiple rows), in spam_profile_most_recent_notes
table each id
will only have 1 row.
I am trying to fill columns maliciousness_level
and suspicious_counts
in audit_records_df
table with values from columns of the same name from spam_profile_most_recent_notes
table (we need to ignore columns ire_1
and ire_2
), based on the following criteria:
audit_records_df
table where id
matches id
in spam_profile_most_recent_notes
, fill the maliciousness_level
and suspicious_counts
in audit_records_df
table with corresponding values from spam_profile_most_recent_notes
table in the rows where id
matches.After the filling, the audit_records_df
table should look like this:
id | audit_type | audit_date | maliciousness_level | suspicious_counts |
---|---|---|---|---|
123456 | Unknown | 2/5/21 | high | 3 |
123456 | Cleared | 2/6/21 | high | 3 |
123456 | Terminated | 2/8/21 | high | 3 |
345678 | Terminated | 2/5/21 | high | 6 |
I've seen some slightly similar questions like this one: Conditionally fill column with value from another DataFrame based on row match in Pandas. However, all I've seen were regarding filling in value in 1 column, unlike my use case where I have to fill in values for multiple columns.
Any advice would be greatly appreciated.
Upvotes: 1
Views: 1223
Reputation: 46
I think you can solve this by:
malicious_level
and suspicious_counts
columns from the audit_record_df
table.id
, malicious_level
and suspicious_counts
columns from the spam_profile_most_recent_notes
id
audit_record_df.drop(['maliciousness_level', 'suspicious_counts'], axis=1, inplace=True)
print(audit_record_df)
id audit_type audit_date
0 123456 Unknown 2/5/21
1 123456 Cleared 2/6/21
2 123456 Terminated 2/8/21
3 345678 Terminated 2/5/21
spam_profile_most_recent_notes = spam_profile_most_recent_notes[['id', 'maliciousness_level', 'suspicious_counts']]
print(spam_profile_most_recent_notes)
id audit_type audit_date
0 123456 Unknown 2/5/21
1 123456 Cleared 2/6/21
2 123456 Terminated 2/8/21
3 345678 Terminated 2/5/21
new_df = audit_record_df.merge(spam_profile_most_recent_notes, on='id', how='left')
print(new_df)
id audit_type audit_date maliciousness_level suspicious_counts
0 123456 Unknown 2/5/21 high 3
1 123456 Cleared 2/6/21 high 3
2 123456 Terminated 2/8/21 high 3
3 345678 Terminated 2/5/21 high 6
Upvotes: 3