Reputation: 123
I want to create a new column in my data frame based on grouping variable in a column in the data frame and then check for condition in another column in the data frame.
I have tried to use np.where with pandas pd.groupby to create a Status column in the data frame where I am checking if the next value in the column is greater than the current one based on grouping each of the Sensor_ID's and based on this I am trying to assign if the Status is set to reset or not_reset, however I haven't been successful with the code.
import pandas as pd
df = pd.DataFrame(data = {'Sensor_ID':['A1', 'A1', 'A1', 'A2','A2', 'A2', 'A2', 'A3', 'A3', 'A3', 'A3', 'A3'], 'Reading':[81, 83.5, 87, 90, 81, 82, 85, 78, 79, 78, 80, 78]})
df
Sensor_ID Reading
0 A1 81.0
1 A1 83.5
2 A1 87.0
3 A2 90.0
4 A2 81.0
5 A2 82.0
6 A2 85.0
7 A3 78.0
8 A3 79.0
9 A3 78.0
10 A3 80.0
11 A3 78.0
I want to create the following condition below using np.where, but I want to use the Sensor_ID as a grouping variable.
df['Status'] = np.where(df.Reading.shift(-1) > df.Reading, 'not_reset', 'reset')
I used np.where with groupby and transform
df['Status'] = np.where(df.groupby('Sensor_ID')['Reading'].transform(df['Reading'].shift(-1) > df['Reading'], 'not_reset', 'reset'))
TypeError: 'Series' objects are mutable, thus they cannot be hashed
I have also tried to use apply and transform with groupby but I get error:
df['Status'] = df.groupby('Sensor_ID').apply(lambda row: 'not_reset' if row['Reading'].shift(-1) > row['Reading'] else 'reset')
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). --> As its comparing the whole series.
and
df['Status'] = df.groupby('Sensor_ID').transform(df['Reading'].shift(-1) > df['Reading'], 'not_reset', 'reset')
TypeError: 'Series' objects are mutable, thus they cannot be hashed
Expected output:
Sensor_ID Reading Status
0 A1 81.0 not_reset
1 A1 83.5 not_reset
2 A1 87.0 not_reset
3 A2 90.0 not_reset
4 A2 81.0 reset
5 A2 82.0 not_reset
6 A2 85.0 not_reset
7 A3 78.0 not_reset
8 A3 79.0 not_reset
9 A3 78.0 reset
10 A3 80.0 not_reset
11 A3 78.0 reset
Upvotes: 3
Views: 2098
Reputation: 403218
You'll need to apply the condition after the grouping IOW, use the result of groupby
with np.where
).
I would use groupby
and diff
, it's the same as comparing the shifted-by-1 value. It's as simple as,
np.where(
df.groupby('Sensor_ID')['Reading'].diff().fillna(1) > 0, 'not reset', 'reset')
array(['not reset', 'not reset', 'not reset', 'not reset', 'reset',
'not reset', 'not reset', 'not reset', 'not reset', 'reset',
'not reset', 'reset'], dtype='<U9')
also see here for the initial version of my answer addressing a (now removed) question.
df['Status'] = np.where(
df.groupby('Sensor_ID')['Reading'].diff().fillna(1) > 0, 'not reset', 'reset')
df
Sensor_ID Reading Status
0 A1 81.0 not reset
1 A1 83.5 not reset
2 A1 87.0 not reset
3 A2 90.0 not reset
4 A2 81.0 reset
5 A2 82.0 not reset
6 A2 85.0 not reset
7 A3 78.0 not reset
8 A3 79.0 not reset
9 A3 78.0 reset
10 A3 80.0 not reset
11 A3 78.0 reset
Upvotes: 2