haimen
haimen

Reputation: 2015

Create group identifier for the timestamps

The following is the data that I am having

  timediff
2018-06-19 01:00:00
2018-06-19 01:00:01
2018-06-19 01:00:02
2018-06-19 01:00:03
2018-06-19 02:00:00
2018-06-19 02:00:01
2018-06-19 02:00:02
2018-06-19 02:00:03
2018-06-19 02:15:00
2018-06-19 02:15:01
2018-06-19 02:15:02
2018-06-19 02:15:03
2018-06-19 02:30:00
2018-06-19 02:30:01
2018-06-19 02:30:02
2018-06-19 02:30:03

I want to create a group identifier for every time groups in the data. if the data is within 4-5 seconds, I want to create an identifier for each of the group.

the output should be like,

timediff                identifier
2018-06-19 01:00:00          1
2018-06-19 01:00:01          1
2018-06-19 01:00:02          1
2018-06-19 01:00:03          1
2018-06-19 02:00:00          2
2018-06-19 02:00:01          2
2018-06-19 02:00:02          2
2018-06-19 02:00:03          2
2018-06-19 02:15:00          3
2018-06-19 02:15:01          3
2018-06-19 02:15:02          3
2018-06-19 02:15:03          3
2018-06-19 02:30:00          4
2018-06-19 02:30:01          4
2018-06-19 02:30:02          4
2018-06-19 02:30:03          4

Since every entry within the time interval is within 4-5 seconds, I want to keep it in one group. Likewise, I want to identify all the groups.

I am new to python and not sure how to do this.

Can anybody help me in doing ?

Upvotes: 1

Views: 49

Answers (2)

tinyhare
tinyhare

Reputation: 2401

I used to group time stamps at five minute intervals. if the time in one group it will generate the same group_key:

group_key = int( timestamps / interval ) * interval

the group_key means that the time is in interregional [group_key,group_key+interval)

for example:

interval is 5 seconds
group_key | timestame| time
1529341200 1529341200 '2018-06-19 01:00:00'
1529341200 1529341201 '2018-06-19 01:00:01'
1529341200 1529341202 '2018-06-19 01:00:02'
1529341200 1529341203 '2018-06-19 01:00:03'
1529341200 1529341204 '2018-06-19 01:00:04'

1529341205 1529341205 '2018-06-19 01:00:05'
1529341205 1529341206 '2018-06-19 01:00:06'
1529341205 1529341207 '2018-06-19 01:00:07'
1529341205 1529341208 '2018-06-19 01:00:08'
1529341205 1529341209 '2018-06-19 01:00:09'

1529341210 1529341210 '2018-06-19 01:00:10'
1529341210 1529341211 '2018-06-19 01:00:11'
1529341210 1529341212 '2018-06-19 01:00:12'
1529341210 1529341213 '2018-06-19 01:00:13'
1529341210 1529341214 '2018-06-19 01:00:14'

for your question:

import time
datetimes=['2018-06-19 01:00:00','2018-06-19 01:00:01','2018-06-19 01:00:02','2018-06-19 01:00:03','2018-06-19 02:00:00','2018-06-19 02:00:01','2018-06-19 02:00:02','2018-06-19 02:00:03','2018-06-19 02:15:00','2018-06-19 02:15:01','2018-06-19 02:15:02','2018-06-19 02:15:03','2018-06-19 02:30:00','2018-06-19 02:30:01','2018-06-19 02:30:02','2018-06-19 02:30:03']

time_interval = 5
group = {}

print "timediff               identifier"
for dt in datetimes:
    timestamp = int(time.mktime(time.strptime(dt, '%Y-%m-%d %H:%M:%S')))
    identifier = int(timestamp/time_interval)*time_interval
    print "'"+dt+"'", identifier

but the identifier is not 1,2,3,4 it is the Starting timestamp of the group,I think it's more meaningful. if you must use 1,2,3,4 you need to do further transformation.

out put:

timediff               identifier
'2018-06-19 01:00:00' 1529341200
'2018-06-19 01:00:01' 1529341200
'2018-06-19 01:00:02' 1529341200
'2018-06-19 01:00:03' 1529341200
'2018-06-19 02:00:00' 1529344800
'2018-06-19 02:00:01' 1529344800
'2018-06-19 02:00:02' 1529344800
'2018-06-19 02:00:03' 1529344800
'2018-06-19 02:15:00' 1529345700
'2018-06-19 02:15:01' 1529345700
'2018-06-19 02:15:02' 1529345700
'2018-06-19 02:15:03' 1529345700
'2018-06-19 02:30:00' 1529346600
'2018-06-19 02:30:01' 1529346600
'2018-06-19 02:30:02' 1529346600
'2018-06-19 02:30:03' 1529346600

Upvotes: 1

Zach King
Zach King

Reputation: 1218

The help you are looking for is in Python's datetime module--specifically the datetime.timedelta class.

Give two datetime instances in Python, you can get their difference simply by subtracting them, and this difference is given to you in the form of a datetime.timedelta instance:

import datetime

# Parse a couple datetimes...
t1 = datetime.strptime('2018-06-19 14:23:14', '%Y-%m-%d %H:%M:%S')
t2 = datetime.strptime('2018-06-19 14:23:16', '%Y-%m-%d %H:%M:%S')

diff = t2 - t1 # Get the timedelta

if diff.seconds < 4:
    # t1 and t2 are in the same "group"

The .seconds property of the timedelta gives you the number of seconds (rounded to the nearest whole second, that is) between the two datetimes.

Given that knowledge you could iterate over a list of datetime strings and group them like so (ASSUMING THE TIMESTAMPS ARE ALREADY IN ORDER/SEQUENCE):

import datetime

datetimes = ['2018-06-19 14:23:14', '2018-06-19 14:23:16', '2018-06-19 14:23:27', '2018-06-19 14:23:28', '2018-06-19 14:23:29']

# For collecting the groups
grouped_datetimes = []

# Assumes the datetimes are already in order; if not, you can sort them beforehand
min_ts = datetime.datetime.strptime(datetimes[0], '%Y-%m-%d %H:%M:%S')
group = [datetimes[0]]
for dt in datetimes[1:]:
    ts = datetime.datetime.strptime(dt, '%Y-%m-%d %H:%M:%S')
    diff = ts - min_ts
    if diff.seconds < 4:
        group.append(dt)
    else:
        grouped_datetimes.append(group)
        group = [dt]
        min_ts = ts

# Add the last group that was built up
if group:
    grouped_datetimes.append(group)


for index, group in enumerate(grouped_datetimes):
    for ts in group:
        print(f'{ts}\t{index}')

That will output:

2018-06-19 14:23:14 0
2018-06-19 14:23:16 0
2018-06-19 14:23:27 1
2018-06-19 14:23:28 1
2018-06-19 14:23:29 1

That's just a quick and dirty solution; depending on your exact use case, you could definitely improve it. Hopefully you get the idea of how to use timedeltas to solve it though.

Upvotes: 0

Related Questions