aviss
aviss

Reputation: 2439

Counting rows back and forth based on time column

I have a dataframe with user ids and two different times. time1 is the same for one user, but time2 is different.

test = pd.DataFrame({
    'id': [1,1,1,1,1,1,1,1,1,1,2,2,2,2,2],
    'time1': ['2018-11-01 21:19:32', '2018-11-01 21:19:32', '2018-11-01 21:19:32','2018-11-01 21:19:32','2018-11-01 21:19:32',
             '2018-11-01 21:19:32', '2018-11-01 21:19:32', '2018-11-01 21:19:32','2018-11-01 21:19:32','2018-11-01 21:19:32',
             '2018-11-02 11:20:12', '2018-11-02 11:20:12','2018-11-02 11:20:12','2018-11-02 11:20:12','2018-11-02 11:20:12'],
    'time2': ['2018-11-01 10:19:32', '2018-11-01 22:19:32', '2018-11-01 12:19:32','2018-11-01 23:44:32','2018-11-01 14:19:32',
              '2018-11-01 15:19:32', '2018-11-01 11:19:32', '2018-11-01 23:19:32','2018-11-01 13:22:32','2018-11-01 23:56:32',
             '2018-11-02 11:57:12', '2018-11-02 10:20:12','2018-11-02 11:25:12','2018-11-02 11:32:12','2018-11-02 09:15:12']
    })

I would like to create a row_num column which sorts and counts time2 according to time1. Everything which happened before time1 is counted in reverse:

    id  time1               time2                row_num
0   1   2018-11-01 21:19:32 2018-11-01 10:19:32 -6
1   1   2018-11-01 21:19:32 2018-11-01 11:19:32 -5
2   1   2018-11-01 21:19:32 2018-11-01 12:19:32 -4
3   1   2018-11-01 21:19:32 2018-11-01 13:19:32 -3
4   1   2018-11-01 21:19:32 2018-11-01 14:19:32 -2
5   1   2018-11-01 21:19:32 2018-11-01 15:19:32 -1
6   1   2018-11-01 21:19:32 2018-11-01 22:19:32 1
7   1   2018-11-01 21:19:32 2018-11-01 23:19:32 2
8   1   2018-11-01 21:19:32 2018-11-01 23:44:32 3
9   1   2018-11-01 21:19:32 2018-11-01 23:56:32 4
10  2   2018-11-02 11:20:12 2018-11-02 09:20:12 -2
11  2   2018-11-02 11:20:12 2018-11-02 10:20:12 -1
12  2   2018-11-02 11:20:12 2018-11-02 11:25:12 1
13  2   2018-11-02 11:20:12 2018-11-02 11:32:12 2
14  2   2018-11-02 11:20:12 2018-11-02 11:57:12 3

Will appreciate your help and advice!

Upvotes: 2

Views: 23

Answers (1)

jezrael
jezrael

Reputation: 862481

Use cumcount with no parameter and also with ascending=False:

#necessary unique default RangeIndex
test = test.reset_index(drop=True)

#convert columns to datetimes
test[['time1','time2']] = test[['time1','time2']].apply(pd.to_datetime)
#sorting both columns
test = test.sort_values(['id','time1','time2'])
#boolean mask
m = test['time2'] < test['time1']

#filter and get counter, last join togather
test['row_num'] = pd.concat([(test[m].groupby('id').cumcount(ascending=False) +1) * -1,
                              test[~m].groupby('id').cumcount() + 1])
print (test)
    id               time1               time2  row_num
0    1 2018-11-01 21:19:32 2018-11-01 10:19:32       -6
6    1 2018-11-01 21:19:32 2018-11-01 11:19:32       -5
2    1 2018-11-01 21:19:32 2018-11-01 12:19:32       -4
8    1 2018-11-01 21:19:32 2018-11-01 13:22:32       -3
4    1 2018-11-01 21:19:32 2018-11-01 14:19:32       -2
5    1 2018-11-01 21:19:32 2018-11-01 15:19:32       -1
1    1 2018-11-01 21:19:32 2018-11-01 22:19:32        1
7    1 2018-11-01 21:19:32 2018-11-01 23:19:32        2
3    1 2018-11-01 21:19:32 2018-11-01 23:44:32        3
9    1 2018-11-01 21:19:32 2018-11-01 23:56:32        4
14   2 2018-11-02 11:20:12 2018-11-02 09:15:12       -2
11   2 2018-11-02 11:20:12 2018-11-02 10:20:12       -1
12   2 2018-11-02 11:20:12 2018-11-02 11:25:12        1
13   2 2018-11-02 11:20:12 2018-11-02 11:32:12        2
10   2 2018-11-02 11:20:12 2018-11-02 11:57:12        3

Upvotes: 2

Related Questions