SemiCode
SemiCode

Reputation: 35

Print specific row based on variable in Python dataframe

I have two dataframes where one of the column names is "Block Name" and has unique values in column, but they may not be exactly in order and there might be some mismatch.

DFS1: 
Block Name  val1 val2
B1           10  100
B2           20  200
B3           30  300

DFS2:
Block Name  val1 val2
B1          100 1000
B4          350 3500
B2          400 5600

I'd like to take each block of "Block Name" in a for loop from DFS1, and if it matches with DFS2 then print two rows one from DFS1 which match as B1 and one from DFS2 which match as B1, similar for other matching rows as B2, etc., if it doesn't match just print that it is a new row from either of the tables.

I'm writing code as shown below, but I cannot pass variable in dfs1.loc command.

for b1 in dfs1['Block Name']:
   print(b1)
   for b2 in dfs2['Block Name']:
       print(b2)
       if b1 == b2:
            print(dfs1.loc[dfs1['Block Name'] == 'b2'])

I couldn't find any variable related documentation on Google for dfs.loc, most of them are with fixed names.

output should be :

B1 10 100
B1 100 1000
B2  20  200
B2  400 5600

Upvotes: 0

Views: 1281

Answers (1)

AlexK
AlexK

Reputation: 3011

I am not going to revise your nested loop because it has too many problems (e.g., you need to include all row elements in the print() command if you want to print them on the same line) and it might not even work if the order of rows changes in the original two data frames.

The data frames should be sorted before you try to print the rows in that order. Here is a simple answer that just relies on Pandas methods:

import pandas as pd
from __future__ import print_function
result = pd.concat([dfs1,dfs2],axis=0).sort_values(['Block Name','val1','val2']).reset_index(drop=True)
print(result)

Upvotes: 3

Related Questions