Reputation: 35
I have two dataframes where one of the column names is "Block Name" and has unique values in column, but they may not be exactly in order and there might be some mismatch.
DFS1:
Block Name val1 val2
B1 10 100
B2 20 200
B3 30 300
DFS2:
Block Name val1 val2
B1 100 1000
B4 350 3500
B2 400 5600
I'd like to take each block of "Block Name" in a for loop from DFS1, and if it matches with DFS2 then print two rows one from DFS1 which match as B1 and one from DFS2 which match as B1, similar for other matching rows as B2, etc., if it doesn't match just print that it is a new row from either of the tables.
I'm writing code as shown below, but I cannot pass variable in dfs1.loc command.
for b1 in dfs1['Block Name']:
print(b1)
for b2 in dfs2['Block Name']:
print(b2)
if b1 == b2:
print(dfs1.loc[dfs1['Block Name'] == 'b2'])
I couldn't find any variable related documentation on Google for dfs.loc, most of them are with fixed names.
output should be :
B1 10 100
B1 100 1000
B2 20 200
B2 400 5600
Upvotes: 0
Views: 1281
Reputation: 3011
I am not going to revise your nested loop because it has too many problems (e.g., you need to include all row elements in the print()
command if you want to print them on the same line) and it might not even work if the order of rows changes in the original two data frames.
The data frames should be sorted before you try to print the rows in that order. Here is a simple answer that just relies on Pandas methods:
import pandas as pd
from __future__ import print_function
result = pd.concat([dfs1,dfs2],axis=0).sort_values(['Block Name','val1','val2']).reset_index(drop=True)
print(result)
Upvotes: 3