Reputation: 617
Assume 2 data frames (df_a and df_b). I want to traverse row-wise and check for an exact match in the value column. If a match is found, I want the index of the matched row to be added in df_a.
df_a
| Index | Name | Value |
|-------|------|-------|
| 1 | Bon | 124 |
| 2 | Bon | 412 |
| 3 | Jaz | 634 |
| 4 | Cal | 977 |
| 5 | Cal | 412 |
| 6 | Bon | 412 |
df_b
| Index | Name | Value |
|-------|------|-------|
| 1 | Cal | 977 |
| 2 | Jaz | 634 |
| 3 | Lan | 650 |
| 4 | Bon | 412 |
Expected Output df
| Index | Name | Value | Index_in_df_b |
|-------|------|-------|---------------|
| 1 | Bon | 124 | Unmatched |
| 2 | Bon | 412 | 4 |
| 3 | Jaz | 634 | 2 |
| 4 | Cal | 977 | 1 |
| 5 | Cal | 412 | Unmatched |
| 6 | Bon | 412 | Unmatched |
Existing Solution:
Create a column --> df_a['Index_in_df_b'] = 'Unmatched'
Then I had 3 solutions:
Started using iterrows
. This solution took a lot of time to process so we shifted to using .loc
. This solution took about 20 minutes to process the data frames with over 7 columns and around 15000 rows in each of them. Then we started using .at
. This seems to be by far the best way to process it. It took ~3 minutes to process the same data frame as mentioned above. This is the current solution.
for index_a in df_a.index:
for index_b in df_b.index:
if df_a.at[index_a,'Name'] == df_b.at[index_b, 'Name']:
# Processing logic to check for value
I'm not sure if apply
can be used since 2 data frames and their row-wise details are necessary and also not sure about vectorization methods. Is there a faster way to proceed with this problem or is the current solution apt for this?
Upvotes: 0
Views: 575
Reputation: 862591
Use if need match Name
and Value
columns use DataFrame.merge
with left join and convert index to column Index_in_df_b
:
df2 = df_b.rename_axis('Index_in_df_b').reset_index()
df = df_a.merge(df2, on=['Name','Value'], how='left').fillna({'Index_in_df_b':'Unmatched'})
print (df)
Name Value Index_in_df_b
0 Bon 124 Unmatched
1 Bon 412 4
2 Jaz 634 2
3 Cal 977 1
4 Cal 412 Unmatched
If need match only by Value
column output is different in sample data:
df2 = df_b.rename_axis('Index_in_df_b').reset_index()[['Index_in_df_b','Value']]
df = df_a.merge(df2, on='Value', how='left').fillna({'Index_in_df_b':'Unmatched'})
print (df)
Name Value Index_in_df_b
0 Bon 124 Unmatched
1 Bon 412 4
2 Jaz 634 2
3 Cal 977 1
4 Cal 412 4
If need match only by Name
column output is different in sample data:
df2 = df_b.rename_axis('Index_in_df_b').reset_index()[['Index_in_df_b','Name']]
df = df_a.merge(df2, on='Name', how='left').fillna({'Index_in_df_b':'Unmatched'})
print (df)
Name Value Index_in_df_b
0 Bon 124 4
1 Bon 412 4
2 Jaz 634 2
3 Cal 977 1
4 Cal 412 1
Upvotes: 2