Reputation: 155
I want to add a new column 'BEST' to this dataframe, which contains a list of the names of the columns which meet these criteria:
Input:
A B C
0 1 2 3
1 3 2 1
2 5 2 4
3 -3 0 2
4 6 3 4
Desired result:
A B C BEST
0 1 2 3 []
1 3 2 1 []
2 5 2 4 ['A']
3 -3 0 2 ['C']
4 6 3 4 ['A', 'B']
Explanation for the result in the new column 'BEST' for each row:
Rows 0 and 1: Empty list in 'BEST', because 2 rows back does not exist.
Row 2: Calculate for column A: 5 - 1 = 4, for column B: 2 - 2 = 0, for column C: 4 - 3 = 1 -> highest is 'A'
Row 3: Calculate for column A: -3 - 1 = -4, for column B: 0 - 2 = -2, for column C: 2 - 3 = -1 -> highest is 'C'
Row 4: Calculate for column A: 6 - 5 = 1, for column B: 3 - 2 = 1, for column C: 4 - 4 = 0 -> highest are both ['A', 'B']
Code to generate the dataframe:
import io, pandas as pd
df = io.StringIO("""
A|B|C
1|2|3
3|2|1
5|2|4
-3|0|2
6|3|4
""")
df = pd.read_csv(df, sep='|')
Upvotes: 0
Views: 503
Reputation: 19947
First use shift
and subtract
to get the diff, then replace the maximum values with the column name and drop the others.
df['BEST'] = (
df.subtract(df.shift(2))
.apply(lambda x: [df.columns[i] for i,e in enumerate(x) if e==max(x)], axis=1)
)
A B C BEST
0 1 2 3 []
1 3 2 1 []
2 5 2 4 [A]
3 -3 0 2 [C]
4 6 3 4 [A, B]
Upvotes: 3
Reputation: 571
You can use shift()
, apply()
and np.where
to do so.
Try this:
df['Best'] = (df - df.shift(2)).iloc[2:].apply(lambda x: list(df.columns[np.where(x==max(x))]), axis=1)
Output:
print(df)
A B C Best
0 1 2 3 NaN
1 3 2 1 NaN
2 5 2 4 [A]
3 -3 0 2 [C]
4 6 3 4 [A, B]
Upvotes: 2
Reputation: 24314
Try:
maximum=df.sub(df.shift(2)).max(skipna=False,axis=1)
#finded the maximum value
df['Best']=(df.where(df.sub(df.shift(2)).isin(maximum))
.T
.apply(lambda x:x.dropna().index.tolist()))
Output of df
:
A B C Best
0 1 2 3 []
1 3 2 1 []
2 5 2 4 [A]
3 -3 0 2 [C]
4 6 3 4 [A, B]
Upvotes: 3