Reputation: 755
I have a df:
col1 col2 col3
0 1 2 3
1 2 3 1
2 3 3 3
3 4 3 2
I want to add a new column based on the following conditions:
- if col1 > col2 > col3 -----> 2
- elif col1 > col2 -----> 1
- elif col1 < col2 < col3 -----> -2
- elif col1 < col2 -----> -1
- else -----> 0
And it should become this:
col1 col2 col3 new
0 1 2 3 -2
1 2 3 1 -1
2 3 3 3 0
3 4 3 2 2
I followed the method from this post by unutbu, with 1 greater than or less than is fine. But in my case with more than 1 greater than or less than, conditions returns error:
conditions = [
(df['col1'] > df['col2'] > df['col3']),
(df['col1'] > df['col2']),
(df['col1'] < df['col2'] < df['col3']),
(df['col1'] < df['col2'])]
choices = [2,1,-2,-1]
df['new'] = np.select(conditions, choices, default=0)
Traceback (most recent call last):
File "<ipython-input-43-768a4c0ecf9f>", line 2, in <module>
(df['col1'] > df['col2'] > df['col3']),
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py", line 1478, in __nonzero__
.format(self.__class__.__name__))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
How should I do this?
Upvotes: 4
Views: 234
Reputation: 28644
One option is with case_when from pyjanitor; under the hood it uses pd.Series.mask
.
The basic idea is a pairing of condition and expected value; you can pass as many pairings as required, followed by a default value and a target column name:
# pip install pyjanitor
import pandas as pd
import janitor
df.case_when(
# condition, value
'col1>col2>col3', 2,
'col1>col2', 1,
'col1<col2<col3', -2,
'col1<col2', -1,
0, # default
column_name = 'new')
col1 col2 col3 new
0 1 2 3 -2
1 2 3 1 -1
2 3 3 3 0
3 4 3 2 2
The code above uses strings for the conditions, which are evaluated by pd.eval
on the parent dataframe - note that speed wise, this can be slower for small datasets. A faster option (depending on the data size) would be to avoid the pd.eval
option:
df.case_when(
# condition, value
df.col1.gt(df.col2) & df.col2.gt(df.col3), 2,
df.col1.gt(df.col2), 1,
df.col1.lt(df.col2) & df.col2.lt(df.col3), -2,
df.col1.lt(df.col2), -1,
0, # default
column_name = 'new')
col1 col2 col3 new
0 1 2 3 -2
1 2 3 1 -1
2 3 3 3 0
3 4 3 2 2
Upvotes: 0
Reputation: 323266
Change your code to
conditions = [
(df['col1'] > df['col2']) & (df['col2'] > df['col3']),
(df['col1'] > df['col2']),
(df['col1'] < df['col2']) & (df['col2'] < df['col3']),
(df['col1'] < df['col2'])]
choices = [2,1,-2,-1]
df['new'] = np.select(conditions, choices, default=0)
Upvotes: 3