Reputation: 155

Pandas: List of maximum values of difference from previous rows in new column

I want to add a new column 'BEST' to this dataframe, which contains a list of the names of the columns which meet these criteria:

Subtract from the current value in each column the value in the row that is 2 rows back
The column that has the highest result of this subtraction will be listed in 'BEST'
If more more than one column shares the same highest result, they all get listed
If all columns have the same result, they all get listed

Input:

Desired result:

   A  B  C        BEST
0  1  2  3          []
1  3  2  1          []
2  5  2  4       ['A']
3 -3  0  2       ['C']
4  6  3  4  ['A', 'B']

Explanation for the result in the new column 'BEST' for each row:

Rows 0 and 1: Empty list in 'BEST', because 2 rows back does not exist.
Row 2: Calculate for column A: 5 - 1 = 4, for column B: 2 - 2 = 0, for column C: 4 - 3 = 1 -> highest is 'A'
Row 3: Calculate for column A: -3 - 1 = -4, for column B: 0 - 2 = -2, for column C: 2 - 3 = -1 -> highest is 'C'
Row 4: Calculate for column A: 6 - 5 = 1, for column B: 3 - 2 = 1, for column C: 4 - 4 = 0 -> highest are both ['A', 'B']

Code to generate the dataframe:

import io, pandas as pd

df = io.StringIO("""
A|B|C
1|2|3
3|2|1
5|2|4
-3|0|2
6|3|4
""")
df = pd.read_csv(df, sep='|')

Upvotes: 0

Answers (3)

Allen Qin

Reputation: 19947

First use shift and subtract to get the diff, then replace the maximum values with the column name and drop the others.

df['BEST'] = (
    df.subtract(df.shift(2))
    .apply(lambda x: [df.columns[i] for i,e in enumerate(x) if e==max(x)], axis=1)
)

    A   B   C   BEST
0   1   2   3   []
1   3   2   1   []
2   5   2   4   [A]
3   -3  0   2   [C]
4   6   3   4   [A, B]

Upvotes: 3

Crystal L

Reputation: 571

You can use shift(), apply() and np.where to do so. Try this:

df['Best'] = (df - df.shift(2)).iloc[2:].apply(lambda x: list(df.columns[np.where(x==max(x))]), axis=1)

Output:

print(df)

A   B   C   Best
0   1   2   3   NaN
1   3   2   1   NaN
2   5   2   4   [A]
3   -3  0   2   [C]
4   6   3   4   [A, B]

Upvotes: 2

Anurag Dabas

Reputation: 24314

Try:

maximum=df.sub(df.shift(2)).max(skipna=False,axis=1)
#finded the maximum value
df['Best']=(df.where(df.sub(df.shift(2)).isin(maximum))
          .T
          .apply(lambda x:x.dropna().index.tolist()))

Output of df:

    A   B   C   Best
0   1   2   3   []
1   3   2   1   []
2   5   2   4   [A]
3   -3  0   2   [C]
4   6   3   4   [A, B]

Upvotes: 3

Pandas: List of maximum values of difference from previous rows in new column

Answers (3)

Related Questions