Rene
Rene

Reputation: 41

How to assign group labels to pandas df rows that have a datetime within a specific interval?

I am trying to sort .txt files based on the time they have been created. A set of 6-8 .txt files is created multiple times a day within only a few minutes. I do not know the exact time intervals do I will have to find a way to automatically find the closest matching date-times (e.g. all that are less than 15 min apart). I have been able to extract the DateTime for each file. Now, I would like to assign a group label that indicates which .txt files have been created in a set (i.e. within a few minutes apart from each other).

My current df looks like this:

index                         values
2020-09-06 17:25:14           97
2020-09-06 17:25:33            0
2020-09-06 17:27:00            3
2020-09-06 17:28:13            7
2020-09-06 17:29:28           10
2020-09-06 17:30:07           26
2020-09-06 17:30:40           34
2020-09-06 17:31:13           34
2020-09-06 18:07:34           99
2020-09-06 18:08:07            0
2020-09-06 18:08:35            3
2020-09-06 18:09:00            8
2020-09-06 18:09:24           11
2020-09-06 18:09:57           32
2020-09-06 18:10:24           43
2020-09-06 19:03:45           99
2020-09-06 19:04:31            0

I would like to automatically assign label "a" to all rows between 17:25 and 17:31, then label "b" to all rows between 18:07 and 18:10, then label "c" to all rows between 19:03 and 19:04.

Most solutions I have found, only aggregate (pd.groupby(), pd.resample(), pd. grouper()). Can I use one of these methods to create my labels?

I thought that this might be a useful start but as far as I understand the solution, it only creates a certain index of a specified index for me.

Thanks (I am happy to share an example .txt file and my code if this is possible here?)

Upvotes: 1

Views: 654

Answers (1)

It_is_Chris
It_is_Chris

Reputation: 14113

Create your conditions and choices then use df.between_time and np.select.

cond = [df.index.isin(df.between_time('17:25', '17:31').index), 
        df.index.isin(df.between_time('18:07', '18:10').index),
        df.index.isin(df.between_time('19:03', '19:04').index)]

choices = ['a', 'b', 'c']

df['new_col'] = np.select(cond, choices, np.nan)

                     values new_col
index                              
2020-09-06 17:25:14      97       a
2020-09-06 17:25:33       0       a
2020-09-06 17:27:00       3       a
2020-09-06 17:28:13       7       a
2020-09-06 17:29:28      10       a
2020-09-06 17:30:07      26       a
2020-09-06 17:30:40      34       a
2020-09-06 17:31:13      34     nan
2020-09-06 18:07:34      99       b
2020-09-06 18:08:07       0       b
2020-09-06 18:08:35       3       b
2020-09-06 18:09:00       8       b
2020-09-06 18:09:24      11       b
2020-09-06 18:09:57      32       b
2020-09-06 18:10:24      43     nan
2020-09-06 19:03:45      99       c
2020-09-06 19:04:31       0     nan

Upvotes: 1

Related Questions