Nabih Bawazir
Nabih Bawazir

Reputation: 7255

How to make day-hour count on pandas dataframe

I did multi-day observation, one customer can be observed in more few days, Here's my data,

customer_id   value    timestamp
1             1000     2018-05-28 03:40:00.000
1             1450     2018-05-28 04:40:01.000
1             1040     2018-05-28 05:40:00.000
1             1500     2018-05-29 02:40:00.000
1             1090     2018-05-29 04:40:00.000
3             1060     2018-05-18 03:40:00.000
3             1040     2018-05-18 05:40:00.000
3             1520     2018-05-19 03:40:00.000
3             1490     2018-05-19 04:40:00.000

Based on previous question How do I building dt.hour in 2 days the first customer appear is 2018-05-28 03:40:00.000 and labelled as Day1 - 3, but for another purpose is should be Day1 - 0, so the output would be

customer_id   value    timestamp                hour
1             1000     2018-05-28 03:40:00.000  Day1 - 0
1             1450     2018-05-28 04:40:01.000  Day1 - 1
1             1040     2018-05-28 05:40:00.000  Day1 - 2
1             1500     2018-05-29 02:40:00.000  Day1 - 23
1             1090     2018-05-29 04:40:00.000  Day2 - 1
3             1060     2018-05-18 03:40:00.000  Day1 - 0
3             1040     2018-05-18 05:40:00.000  Day1 - 2
3             1520     2018-05-19 03:40:00.000  Day2 - 0
3             1490     2018-05-19 04:40:00.000  Day2 - 1

Upvotes: 1

Views: 427

Answers (1)

jezrael
jezrael

Reputation: 862511

I think need add all mising hours for correct cumcount:

#floor to hours
df['timestamp'] = df['timestamp'].dt.floor('h')
#add missing hours per group
df = df.set_index('timestamp').groupby('customer_id').apply(lambda x: x.asfreq('h'))
#cumulative count per group
df['hour'] = df.groupby(level=0).cumcount() 
df= df.dropna(subset=['customer_id']).drop('customer_id', 1).reset_index()

df['hour'] = ('Day' + (df['hour'] // 24).add(1).astype(str) +
              ' - ' + (df['hour'] % 24).astype(str))
print (df) 
   customer_id           timestamp   value       hour
0            1 2018-05-28 03:00:00  1000.0   Day1 - 0
1            1 2018-05-28 04:00:00  1450.0   Day1 - 1
2            1 2018-05-28 05:00:00  1040.0   Day1 - 2
3            1 2018-05-29 02:00:00  1500.0  Day1 - 23
4            1 2018-05-29 04:00:00  1090.0   Day2 - 1
5            3 2018-05-18 03:00:00  1060.0   Day1 - 0
6            3 2018-05-18 05:00:00  1040.0   Day1 - 2
7            3 2018-05-19 03:00:00  1520.0   Day2 - 0
8            3 2018-05-19 04:00:00  1490.0   Day2 - 1

Upvotes: 1

Related Questions