Reputation: 85
I currently have a dataset with 4 columns. One has TimeStamps and the others have strings which represent electrical meter statuses such as "PRODUCT", "PRODUCTOFF", "HEAT" and so on.
The data is currently in a 10 minute format. I need to get this data hourly and keep the most frequent value. So if I have "PRODUCT, PRODUCT, PRODUCT, WATER, HEAT, WATEROFF" for all those 10m TimeStamps, I want it to show PRODUCT. In the situation where I have 3 strings showing twice or 2 strings showing thrice, it can pick any of them.
I was thinking on organizing the data by frequency and getting the nlargest(1) value but I can't get it to work.
This is what I have:
TimeStamp Meter1 Meter2 Meter3
2019-09-16 12:00:00 OFFWATER PRODUCT OFF
2019-09-16 12:10:00 HEAT PRODUCT OFF
2019-09-16 12:20:00 FANS PRODUCT OFF
2019-09-16 12:30:00 FANS PRODUCT OFF
2019-09-16 12:40:00 FANS PRODUCT OFF
2019-09-16 12:50:00 FANS PRODUCT OFF
2019-09-16 13:00:00 HEAT PRODUCT OFF
2019-09-16 13:10:00 HEAT PRODUCT OFF
2019-09-16 13:20:00 HEAT PRODUCT OFF
2019-09-16 13:30:00 WATER PRODUCT OFF
2019-09-16 13:40:00 WATER PRODUCT OFF
2019-09-16 13:50:00 PRODUCT PRODUCT OFF
This is what I need:
TimeStamp Meter1 Meter2 Meter3
2019-09-16 12:00:00 FANS PRODUCT OFF
2019-09-16 13:00:00 HEAT PRODUCT OFF
Any help would be appreciated.
Upvotes: 0
Views: 179
Reputation: 323276
We can check mode
, notice here i slice the 1st value after mode
, since we may have two values are modes for the group
df.groupby(df['TimeStamp'].dt.strftime('%Y-%m-%d %H')).apply(lambda x : x.mode().iloc[0])
0 TimeStamp Meter1 Meter2 Meter3
TimeStamp
2019-09-16 12 2019-09-16 12:00:00 FANS PRODUCT OFF
2019-09-16 13 2019-09-16 13:00:00 HEAT PRODUCT OFF
Upvotes: 1