Reputation: 27
Forgive me if this is a repeat question, but I can't find the answer and I'm not even sure what the right terminology is.
I have two dataframes that don't have completely matching rows or columns. Something like:
Balances = pd.DataFrame({'Name':['Alan','Barry','Carl', 'Debbie', 'Elaine'],
'Age Of Debt':[1,4,3,7,2],
'Balance':[500,5000,300,100,3000],
'Payment Due Date':[1,1,30,14,1]})
Payments = pd.DataFrame({'Name':['Debbie','Alan','Carl'],
'Balance':[50,100,30]})
I want to subtract the Payments dataframe from the Balances dataframe based on Name, so essentially a new dataframe that looks like this:
pd.DataFrame({'Name':['Alan','Barry','Carl', 'Debbie', 'Elaine'],
'Age Of Debt':[1,4,3,7,2],
'Balance':[400,5000,270,50,3000],
'Payment Due Date':[1,1,30,14,1]})
I can imagine having to iterate over the rows of Balances, but when both dataframes are very large I don't think it's very efficient.
Upvotes: 1
Views: 37
Reputation: 195478
You can use .merge
:
tmp = pd.merge(Balances, Payments, on="Name", how="outer").fillna(0)
Balances["Balance"] = tmp["Balance_x"] - tmp["Balance_y"]
print(Balances)
Prints:
Name Age Of Debt Balance Payment Due Date
0 Alan 1 400.0 1
1 Barry 4 5000.0 1
2 Carl 3 270.0 30
3 Debbie 7 50.0 14
4 Elaine 2 3000.0 1
Upvotes: 1