Anna Shevtsova
Anna Shevtsova

Reputation: 123

Replace values in column based on same or closer values from another columns pandas

I have two dataframes look like:

DF1:

Score 1 Avg_life
4.033986 3482.0
9.103820 758.0
-1.34432 68000.0
218670040.0 33708.0
2.291000 432.0

DF2:

Score 1 life
3.033986 0
9.103820 0
9.103820 0
7.350981 0
1.443400 0
9.103820 0
-1.134486 0

The 0 values in "life" from second dataframe should be replaced by the values from "avg life" from the first dataframe if the columns "Score 1" from both dataframes are the same. Btw if there are no the same values, we take the closest value from "Score1" in DF1 to value from "Score1" in DF2.

The problem is in the word "the closest".

For example: I don't have the value "3.033986" in DF1 in "Score 1", but I want to take the closest value to this - "4.033986" and change 0 in "life" column to "3482.0" from "Avg_life" because "3.033986" is closer to "4.033986".

The result should be like this:

DF_result:

Score 1 life
3.033986 3482.0
9.103820 758.0
9.103820 758.0
7.350981 758.0
1.443400 432.0
9.103820 758.0
-1.134486 68000.0

I hope I made it clear....

Thanks for all help!

Upvotes: 1

Views: 1526

Answers (2)

Henry Ecker
Henry Ecker

Reputation: 35686

If order in the result set doesn't matter, we can sort both DataFrames with sort_values then merge_asof to get the nearest match.

"nearest" search:

selects the row in the right DataFrame whose ‘on’ key is closest in absolute distance to the left’s key.

df3 = pd.merge_asof(
    df2.sort_values('Score 1'),
    df1.sort_values('Score 1'),
    on='Score 1',
    direction='nearest'
)

    Score 1  life  Avg_life
0 -1.134486     0   68000.0
1  1.443400     0     432.0
2  3.033986     0     432.0
3  7.350981     0     758.0
4  9.103820     0     758.0
5  9.103820     0     758.0
6  9.103820     0     758.0

drop and rename can be used to clean up the output:

df3 = pd.merge_asof(
    df2.sort_values('Score 1'),
    df1.sort_values('Score 1'),
    on='Score 1',
    direction='nearest'
).drop(columns='life').rename(columns={'Avg_life': 'life'})

    Score 1     life
0 -1.134486  68000.0
1  1.443400    432.0
2  3.033986    432.0
3  7.350981    758.0
4  9.103820    758.0
5  9.103820    758.0
6  9.103820    758.0

If order is important, we can sort df2 outside of merge_asof and overwrite the result set with the index of df2 with set_axis then sort_index to restore the initial order:

df2_sorted = df2.sort_values('Score 1')
df3 = (
    pd.merge_asof(df2_sorted,
                  df1.sort_values('Score 1'),
                  on='Score 1',
                  direction='nearest')
        .drop(columns='life')
        .rename(columns={'Avg_life': 'life'})
        .set_axis(df2_sorted.index, axis=0)
        .sort_index()
)

    Score 1     life
0  3.033986    432.0
1  9.103820    758.0
2  9.103820    758.0
3  7.350981    758.0
4  1.443400    432.0
5  9.103820    758.0
6 -1.134486  68000.0

Upvotes: 4

piterbarg
piterbarg

Reputation: 8219

First we find the value if df1['Score1'] that is the closest to each value in df2['Score1'], and put it into df2['match']:

df2['match'] = df2['Score1'].apply(lambda s : min(df1['Score1'].values, key = lambda x: abs(x-s)))

df2 now looks like this


    Score1   life   match
0   3.033986    0   2.29100
1   9.103820    0   9.10382
2   9.103820    0   9.10382
3   7.350981    0   9.10382
4   1.443400    0   2.29100
5   9.103820    0   9.10382
6   -1.134486   0   -1.34432

Now we just merge on match, drop unneeded columns and rename others

(df2[['match', 'Score1']].merge(df1, how = 'left', left_on = 'match', right_on = 'Score1', suffixes = ['','_2'])
    .rename(columns = {'Avg_life':'life'})
    .drop(columns = ['match', 'Score1_2'])
)

output


    Score1      life
0   3.033986    432.0
1   9.103820    758.0
2   9.103820    758.0
3   7.350981    758.0
4   1.443400    432.0
5   9.103820    758.0
6   -1.134486   68000.0

Upvotes: 4

Related Questions