User2939
User2939

Reputation: 165

How to subtract numbers from lines to get time difference

So for a data like :

01:58:30| USER INPUT : "Hello " 
01:58:30| SYSTEM RESPONSE: "Hello. How are you" 
01:58:56| USER INPUT : "Good thank you. How about you?" 
01:58:57| SYSTEM RESPONSE: "I am doing great!" 
01:59:13| USER INPUT : "Thats it" 
01:59:15| SYSTEM RESPONSE: "Deal"
13:29:28| USER INPUT : "Deal" 

I want to subtract the time it took for response for each line For example:

01:58:30| USER INPUT : "Hello " 
<0 seconds>
01:58:30| SYSTEM RESPONSE: "Hello. How are you" 
<26 seconds>
01:58:56| USER INPUT : "Good thank you. How about you?" 
<1 seconds>
01:58:57| SYSTEM RESPONSE: "I am doing great!" 
<16 seconds>
01:59:13| USER INPUT : "Thats it" 
<2 seconds>
01:59:15| SYSTEM RESPONSE: "Deal"

so far, I know how to calculate the time difference:

from datetime import datetime
s1 = '01:59:13'
s2 = 01:59:15' # for example
format = '%H:%M:%S'
time = datetime.strptime(s2, format) - datetime.strptime(s1, format)
print time

I could use any suggestions to get just the a way to read lines. Please feel free to ask me more clarification any time!

Upvotes: 1

Views: 40

Answers (2)

Andrej Kesely
Andrej Kesely

Reputation: 195418

You can use re module for extracting the time data. I wrote simple generator that takes string input and outputs all lines along with the time interval between them:

string_input = """
01:58:30| USER INPUT : "Hello "
01:58:30| SYSTEM RESPONSE: "Hello. How are you"
01:58:56| USER INPUT : "Good thank you. How about you?"
01:58:57| SYSTEM RESPONSE: "I am doing great!"
01:59:13| USER INPUT : "Thats it"
01:59:15| SYSTEM RESPONSE: "Deal"
13:29:28| USER INPUT : "Deal"
"""

import re
from datetime import datetime

def get_time(data):
    groups = re.findall(r'(([\d:]+)\|.*)', string_input)
    time_format = '%H:%M:%S'

    t1, t2 = None, None
    for (line1, time1), (line2, time2) in zip(groups, groups[1::1]):
        time1 = datetime.strptime(time1, time_format)
        time2 = datetime.strptime(time2, time_format)
        total_time = int((time2 - time1).total_seconds())
        singular_or_plural = 'second' if total_time == 1 else 'seconds'
        yield f'{line1}\n<{total_time} {singular_or_plural}>'
    yield f'{line2}'

for line in get_time(string_input):
    print(line)

Output is:

01:58:30| USER INPUT : "Hello "
<0 seconds>
01:58:30| SYSTEM RESPONSE: "Hello. How are you"
<26 seconds>
01:58:56| USER INPUT : "Good thank you. How about you?"
<1 second>
01:58:57| SYSTEM RESPONSE: "I am doing great!"
<16 seconds>
01:59:13| USER INPUT : "Thats it"
<2 seconds>
01:59:15| SYSTEM RESPONSE: "Deal"
<41413 seconds>
13:29:28| USER INPUT : "Deal"

Upvotes: 2

DYZ
DYZ

Reputation: 57033

Assuming that a "USER INPUT" row is always immediately followed by a matching "SYSTEM RESPONSE" row, here's a pandas-based solution:

First, read the data from the file:

import pandas as pd
df = pd.read_csv("youf_file_name", sep=r'\s?[|:]\s+',\
                 header=None, parse_dates=[0])

Shift the date column up and subtract it from itself (to get the row-to-row difference; NaT is not-a-time):

df['diff'] = df[0].shift(-1) - df[0]

Remove the "date" part:

df[0] = df[0].dt.time
#       0                1                                 2     diff
#01:58:30       USER INPUT                          "Hello " 00:00:00
#01:58:30  SYSTEM RESPONSE              "Hello. How are you" 00:00:26
#01:58:56       USER INPUT  "Good thank you. How about you?" 00:00:01
#01:58:57  SYSTEM RESPONSE               "I am doing great!" 00:00:16
#01:59:13       USER INPUT                        "Thats it" 00:00:02
#01:59:15  SYSTEM RESPONSE                            "Deal" 11:30:13
#13:29:28       USER INPUT                            "Deal"      NaT

As a bonus, you get the times between the interactions.

Upvotes: 1

Related Questions