Reputation: 165
So for a data like :
01:58:30| USER INPUT : "Hello "
01:58:30| SYSTEM RESPONSE: "Hello. How are you"
01:58:56| USER INPUT : "Good thank you. How about you?"
01:58:57| SYSTEM RESPONSE: "I am doing great!"
01:59:13| USER INPUT : "Thats it"
01:59:15| SYSTEM RESPONSE: "Deal"
13:29:28| USER INPUT : "Deal"
I want to subtract the time it took for response for each line For example:
01:58:30| USER INPUT : "Hello "
<0 seconds>
01:58:30| SYSTEM RESPONSE: "Hello. How are you"
<26 seconds>
01:58:56| USER INPUT : "Good thank you. How about you?"
<1 seconds>
01:58:57| SYSTEM RESPONSE: "I am doing great!"
<16 seconds>
01:59:13| USER INPUT : "Thats it"
<2 seconds>
01:59:15| SYSTEM RESPONSE: "Deal"
so far, I know how to calculate the time difference:
from datetime import datetime
s1 = '01:59:13'
s2 = 01:59:15' # for example
format = '%H:%M:%S'
time = datetime.strptime(s2, format) - datetime.strptime(s1, format)
print time
I could use any suggestions to get just the a way to read lines. Please feel free to ask me more clarification any time!
Upvotes: 1
Views: 40
Reputation: 195418
You can use re
module for extracting the time data. I wrote simple generator that takes string input and outputs all lines along with the time interval between them:
string_input = """
01:58:30| USER INPUT : "Hello "
01:58:30| SYSTEM RESPONSE: "Hello. How are you"
01:58:56| USER INPUT : "Good thank you. How about you?"
01:58:57| SYSTEM RESPONSE: "I am doing great!"
01:59:13| USER INPUT : "Thats it"
01:59:15| SYSTEM RESPONSE: "Deal"
13:29:28| USER INPUT : "Deal"
"""
import re
from datetime import datetime
def get_time(data):
groups = re.findall(r'(([\d:]+)\|.*)', string_input)
time_format = '%H:%M:%S'
t1, t2 = None, None
for (line1, time1), (line2, time2) in zip(groups, groups[1::1]):
time1 = datetime.strptime(time1, time_format)
time2 = datetime.strptime(time2, time_format)
total_time = int((time2 - time1).total_seconds())
singular_or_plural = 'second' if total_time == 1 else 'seconds'
yield f'{line1}\n<{total_time} {singular_or_plural}>'
yield f'{line2}'
for line in get_time(string_input):
print(line)
Output is:
01:58:30| USER INPUT : "Hello "
<0 seconds>
01:58:30| SYSTEM RESPONSE: "Hello. How are you"
<26 seconds>
01:58:56| USER INPUT : "Good thank you. How about you?"
<1 second>
01:58:57| SYSTEM RESPONSE: "I am doing great!"
<16 seconds>
01:59:13| USER INPUT : "Thats it"
<2 seconds>
01:59:15| SYSTEM RESPONSE: "Deal"
<41413 seconds>
13:29:28| USER INPUT : "Deal"
Upvotes: 2
Reputation: 57033
Assuming that a "USER INPUT" row is always immediately followed by a matching "SYSTEM RESPONSE" row, here's a pandas-based solution:
First, read the data from the file:
import pandas as pd
df = pd.read_csv("youf_file_name", sep=r'\s?[|:]\s+',\
header=None, parse_dates=[0])
Shift the date column up and subtract it from itself (to get the row-to-row difference; NaT is not-a-time):
df['diff'] = df[0].shift(-1) - df[0]
Remove the "date" part:
df[0] = df[0].dt.time
# 0 1 2 diff
#01:58:30 USER INPUT "Hello " 00:00:00
#01:58:30 SYSTEM RESPONSE "Hello. How are you" 00:00:26
#01:58:56 USER INPUT "Good thank you. How about you?" 00:00:01
#01:58:57 SYSTEM RESPONSE "I am doing great!" 00:00:16
#01:59:13 USER INPUT "Thats it" 00:00:02
#01:59:15 SYSTEM RESPONSE "Deal" 11:30:13
#13:29:28 USER INPUT "Deal" NaT
As a bonus, you get the times between the interactions.
Upvotes: 1