What is the efficient way to perform row wise match in pandas?

Question

Assume 2 data frames (df_a and df_b). I want to traverse row-wise and check for an exact match in the value column. If a match is found, I want the index of the matched row to be added in df_a.

df_a

| Index | Name | Value |
|-------|------|-------|
| 1     | Bon  | 124   |
| 2     | Bon  | 412   |
| 3     | Jaz  | 634   |
| 4     | Cal  | 977   |
| 5     | Cal  | 412   |
| 6     | Bon  | 412   |

df_b

| Index | Name | Value |
|-------|------|-------|
| 1     | Cal  | 977   |
| 2     | Jaz  | 634   |
| 3     | Lan  | 650   |
| 4     | Bon  | 412   |

Expected Output df

| Index | Name | Value | Index_in_df_b |
|-------|------|-------|---------------|
| 1     | Bon  | 124   | Unmatched     |
| 2     | Bon  | 412   | 4             |
| 3     | Jaz  | 634   | 2             |
| 4     | Cal  | 977   | 1             |
| 5     | Cal  | 412   | Unmatched     |
| 6     | Bon  | 412   | Unmatched     |

Existing Solution:

Create a column --> df_a['Index_in_df_b'] = 'Unmatched'

Then I had 3 solutions:

Started using iterrows. This solution took a lot of time to process so we shifted to using .loc. This solution took about 20 minutes to process the data frames with over 7 columns and around 15000 rows in each of them. Then we started using .at. This seems to be by far the best way to process it. It took ~3 minutes to process the same data frame as mentioned above. This is the current solution.

for index_a in df_a.index:
    for index_b in df_b.index:
        if df_a.at[index_a,'Name'] == df_b.at[index_b, 'Name']:
            # Processing logic to check for value

I'm not sure if apply can be used since 2 data frames and their row-wise details are necessary and also not sure about vectorization methods. Is there a faster way to proceed with this problem or is the current solution apt for this?

What is the efficient way to perform row wise match in pandas?

Answers (1)

Related Questions