Reputation: 123
I have two dataframes look like:
DF1:
Score 1 | Avg_life |
---|---|
4.033986 | 3482.0 |
9.103820 | 758.0 |
-1.34432 | 68000.0 |
218670040.0 | 33708.0 |
2.291000 | 432.0 |
DF2:
Score 1 | life |
---|---|
3.033986 | 0 |
9.103820 | 0 |
9.103820 | 0 |
7.350981 | 0 |
1.443400 | 0 |
9.103820 | 0 |
-1.134486 | 0 |
The 0 values in "life" from second dataframe should be replaced by the values from "avg life" from the first dataframe if the columns "Score 1" from both dataframes are the same. Btw if there are no the same values, we take the closest value from "Score1" in DF1 to value from "Score1" in DF2.
The problem is in the word "the closest".
For example: I don't have the value "3.033986" in DF1 in "Score 1", but I want to take the closest value to this - "4.033986" and change 0 in "life" column to "3482.0" from "Avg_life" because "3.033986" is closer to "4.033986".
The result should be like this:
DF_result:
Score 1 | life |
---|---|
3.033986 | 3482.0 |
9.103820 | 758.0 |
9.103820 | 758.0 |
7.350981 | 758.0 |
1.443400 | 432.0 |
9.103820 | 758.0 |
-1.134486 | 68000.0 |
I hope I made it clear....
Thanks for all help!
Upvotes: 1
Views: 1526
Reputation: 35686
If order in the result set doesn't matter, we can sort both DataFrames with sort_values then merge_asof to get the nearest
match.
"nearest"
search:
selects the row in the right DataFrame whose ‘on’ key is closest in absolute distance to the left’s key.
df3 = pd.merge_asof(
df2.sort_values('Score 1'),
df1.sort_values('Score 1'),
on='Score 1',
direction='nearest'
)
Score 1 life Avg_life
0 -1.134486 0 68000.0
1 1.443400 0 432.0
2 3.033986 0 432.0
3 7.350981 0 758.0
4 9.103820 0 758.0
5 9.103820 0 758.0
6 9.103820 0 758.0
drop and rename can be used to clean up the output:
df3 = pd.merge_asof(
df2.sort_values('Score 1'),
df1.sort_values('Score 1'),
on='Score 1',
direction='nearest'
).drop(columns='life').rename(columns={'Avg_life': 'life'})
Score 1 life
0 -1.134486 68000.0
1 1.443400 432.0
2 3.033986 432.0
3 7.350981 758.0
4 9.103820 758.0
5 9.103820 758.0
6 9.103820 758.0
If order is important, we can sort df2 outside of merge_asof and overwrite the result set with the index of df2
with set_axis then sort_index to restore the initial order:
df2_sorted = df2.sort_values('Score 1')
df3 = (
pd.merge_asof(df2_sorted,
df1.sort_values('Score 1'),
on='Score 1',
direction='nearest')
.drop(columns='life')
.rename(columns={'Avg_life': 'life'})
.set_axis(df2_sorted.index, axis=0)
.sort_index()
)
Score 1 life
0 3.033986 432.0
1 9.103820 758.0
2 9.103820 758.0
3 7.350981 758.0
4 1.443400 432.0
5 9.103820 758.0
6 -1.134486 68000.0
Upvotes: 4
Reputation: 8219
First we find the value if df1['Score1']
that is the closest to each value in df2['Score1']
, and put it into df2['match']
:
df2['match'] = df2['Score1'].apply(lambda s : min(df1['Score1'].values, key = lambda x: abs(x-s)))
df2
now looks like this
Score1 life match
0 3.033986 0 2.29100
1 9.103820 0 9.10382
2 9.103820 0 9.10382
3 7.350981 0 9.10382
4 1.443400 0 2.29100
5 9.103820 0 9.10382
6 -1.134486 0 -1.34432
Now we just merge on match
, drop unneeded columns and rename others
(df2[['match', 'Score1']].merge(df1, how = 'left', left_on = 'match', right_on = 'Score1', suffixes = ['','_2'])
.rename(columns = {'Avg_life':'life'})
.drop(columns = ['match', 'Score1_2'])
)
output
Score1 life
0 3.033986 432.0
1 9.103820 758.0
2 9.103820 758.0
3 7.350981 758.0
4 1.443400 432.0
5 9.103820 758.0
6 -1.134486 68000.0
Upvotes: 4