Reputation: 89
I have two data frames df1
and df2
with the following columns:
A = ['id', 'a_id', 'c', 'd']
B = ['a_id', 'e', 'f']
For a_id
in df1
matching a_id
in df2
, I need to add e
from df2
to df1
row. Df1
is around 7 million rows and df2 is around 15k. I tried the code below but it takes too long. I was wondering if there's a better solution that could speed things up a bit and more memory efficient.
def map_df(row):
for i, r in df2.iterrows():
if row['a_id'] == r['a_id']:
return row2['part_mean_correctness']
df1['e'] = df1.apply (lambda row: map_df(row), axis=1)
Upvotes: 0
Views: 265
Reputation: 26
Not sure I completely understood your issue but are you trying to do a merge? If so, then your code should just look like:
import pandas as pd
df1 = pd.DataFrame({'a_id':[1, 2, 3, 4, 5], 'c': ["foo", "bar", "baz", "qux", "foobar"]})
df2 = pd.DataFrame({'a_id':[3, 4], 'e': ["bar", "baz"]})
df1 = df1.merge(df2, on=["a_id"], how="left")
Then df1
should look like
a_id c e
0 1 foo NaN
1 2 bar NaN
2 3 baz bar
3 4 qux baz
4 5 foobar NaN
Upvotes: 1