Reputation: 461
Working on a problem, I have the following dataframe in python
week hour week_hr store_code baskets
0 201616 106 201616106 505 0
1 201616 107 201616107 505 0
2 201616 108 201616108 505 0
3 201616 109 201616109 505 18
4 201616 110 201616110 505 0
5 201616 106 201616108 910 0
6 201616 107 201616106 910 0
7 201616 108 201616107 910 2
8 201616 109 201616108 910 3
9 201616 110 201616109 910 10
Here "hour" variable is a concat of "weekday" and "hour of shop", example weekday is monday=1 and hour of shop is 6am then hour variable = 106, similarly cal_hr is a concat of week and hour. I want to get those rows where i see a trend of no baskets , i.e 0 baskets for rolling 3 weeks. in the above case i will only get the first 3 rows. i.e. for store 505 there is a continuous cycle of 1 baskets from 106 to 108. But i do not want the rows (4,5,6) because even though there are 0 baskets for 3 continuous hours but the hours are actually NOT continuous. 110 -> 106 -> 107 . For the hours to be continuous they should lie in the range of 106 - 110.. Essentially i want all stores and the respective rows if it has 0 baskets for continuous 3 hours on any given day. Dummy output
week hour week_hr store_code baskets
0 201616 106 201616106 505 0
1 201616 107 201616107 505 0
2 201616 108 201616108 505 0
Can i do this in python using pandas and loops? The dataset requires sorting by store and hour. Completely new to python (
Upvotes: 4
Views: 244
Reputation: 2728
Do the following:
Now you can give groups to continuous and filter as you want.
import numpy as np
import pandas as pd
# 1
t1 = df.sort_values(['store_code', 'week_hr'])
# 2
t2 = t1[t1['baskets'] == 0]
# 3
continuous = t2['week_hr'][1:].values-t2['week_hr'][:-1].values == 1
groups = np.cumsum(np.hstack([False, continuous==False]))
t2['groups'] = groups
# 4
t3 = t2.groupby(['store_code', 'groups'], as_index=False)['week_hr'].count()
t4 = t3[t3.week_hr > 2]
print pd.merge(t2, t4[['store_code', 'groups']])
There's no need for looping!
Upvotes: 1
Reputation: 1989
You can solve:
Code:
t1 = df.sort_values(['store_code', 'week_hr'])
t2 = t1[t1['baskets'] == 0]
grouped = t2.groupby('store_code')['week_hr'].apply(lambda x: x.tolist())
for store_code, week_hrs in grouped.iteritems():
print(store_code, week_hrs)
# do something
Upvotes: 0