Will
Will

Reputation: 99

Incrementing dates in pandas groupby

I'm building a basic rota/schedule for staff, and have a DataFrame from a MySQL cursor which gives a list of IDs, dates and class

        id             the_date  class
0   195593  2017-09-12 14:00:00      3
1   193972  2017-09-13 09:15:00      2
2   195594  2017-09-13 14:00:00      3
3   195595  2017-09-15 14:00:00      3
4   193947  2017-09-16 17:30:00      3
5   195627  2017-09-17 08:00:00      2
6   193948  2017-09-19 11:30:00      2
7   195628  2017-09-21 08:00:00      2
8   193949  2017-09-21 11:30:00      2
9   195629  2017-09-24 08:00:00      2
10  193950  2017-09-24 10:00:00      2
11  193951  2017-09-27 11:30:00      2
12  195644  2017-09-28 06:00:00      1
13  194400  2017-09-28 08:00:00      1
14  195630  2017-09-28 08:00:00      2
15  193952  2017-09-29 11:30:00      2
16  195631  2017-10-01 08:00:00      2
17  194401  2017-10-06 08:00:00      1
18  195645  2017-10-06 10:00:00      1
19  195632  2017-10-07 13:30:00      3

If the class == 1, I need that instance duplicated 5 times.

first_class = df[df['class'] == 1]
non_first_class = df[df['class'] != 1]
first_class_replicated = pd.concat([tests_df]*5,ignore_index=True).sort_values(['the_date'])

    id             the_date  class
0   195644  2017-09-28 06:00:00      1
16  195644  2017-09-28 06:00:00      1
4   195644  2017-09-28 06:00:00      1
12  195644  2017-09-28 06:00:00      1
8   195644  2017-09-28 06:00:00      1
17  194400  2017-09-28 08:00:00      1
13  194400  2017-09-28 08:00:00      1
9   194400  2017-09-28 08:00:00      1
5   194400  2017-09-28 08:00:00      1
1   194400  2017-09-28 08:00:00      1
6   194401  2017-10-06 08:00:00      1
18  194401  2017-10-06 08:00:00      1
10  194401  2017-10-06 08:00:00      1
14  194401  2017-10-06 08:00:00      1
2   194401  2017-10-06 08:00:00      1
11  195645  2017-10-06 10:00:00      1
3   195645  2017-10-06 10:00:00      1
15  195645  2017-10-06 10:00:00      1
7   195645  2017-10-06 10:00:00      1
19  195645  2017-10-06 10:00:00      1

I then merge non_first_class and first_class_replicated. Before that though, I need the dates in first_class_replicated to increment by one day, grouped by id. Below is how I need it to look. Is there an elegant Pandas solution to this, or should I be looking at looping over a groupby series to modify the dates?

Desired:

id      
0   195644  2017-09-28 6:00:00
16  195644  2017-09-29 6:00:00
4   195644  2017-09-30 6:00:00
12  195644  2017-10-01 6:00:00
8   195644  2017-10-02 6:00:00
17  194400  2017-09-28 8:00:00
13  194400  2017-09-29 8:00:00
9   194400  2017-09-30 8:00:00
5   194400  2017-10-01 8:00:00
1   194400  2017-10-02 8:00:00
6   194401  2017-10-06 8:00:00
18  194401  2017-10-07 8:00:00
10  194401  2017-10-08 8:00:00
14  194401  2017-10-09 8:00:00
2   194401  2017-10-10 8:00:00
11  195645  2017-10-06 10:00:00
3   195645  2017-10-07 10:00:00
15  195645  2017-10-08 10:00:00
7   195645  2017-10-09 10:00:00
19  195645  2017-10-10 10:00:00

Upvotes: 1

Views: 218

Answers (1)

jezrael
jezrael

Reputation: 862441

You can use cumcount for count categories, then convert to_timedelta and add to column:

#another solution for repeat
first_class_replicated = first_class.loc[np.repeat(first_class.index, 5)]
                                    .sort_values(['the_date'])

df1 = first_class_replicated.groupby('id').cumcount()
first_class_replicated['the_date'] += pd.to_timedelta(df1, unit='D')
print (first_class_replicated)
        id            the_date  class
0   195644 2017-09-28 06:00:00      1
16  195644 2017-09-29 06:00:00      1
4   195644 2017-09-30 06:00:00      1
12  195644 2017-10-01 06:00:00      1
8   195644 2017-10-02 06:00:00      1
17  194400 2017-09-28 08:00:00      1
13  194400 2017-09-29 08:00:00      1
9   194400 2017-09-30 08:00:00      1
5   194400 2017-10-01 08:00:00      1
1   194400 2017-10-02 08:00:00      1
6   194401 2017-10-06 08:00:00      1
18  194401 2017-10-07 08:00:00      1
10  194401 2017-10-08 08:00:00      1
14  194401 2017-10-09 08:00:00      1
2   194401 2017-10-10 08:00:00      1
11  195645 2017-10-06 10:00:00      1
3   195645 2017-10-07 10:00:00      1
15  195645 2017-10-08 10:00:00      1
7   195645 2017-10-09 10:00:00      1
19  195645 2017-10-10 10:00:00      1

Upvotes: 2

Related Questions