Reputation:
Here I want to calculate time interval in between rows in time column import from csv file. In my csv file it include date and time. Here I want to display time difference in between times in rows. That is my expected output. My code is:-
def time_diff(start, end):
start.append(pd.to_datetime(data['time'][0],formt = '%H:%M:%S').time())
end.append(pd.to_datetime(len(data['time']), format='%H:%M:%S').time())
if isinstance(start, datetime_time): # convert to datetime
assert isinstance(end, datetime_time)
start, end = [datetime.combine(datetime.min,i) for t in [start, end]]
if start <= end:
return end - start
else:
end += timedelta(1) # +day
assert end > start
return end - start
for index, row in data.iterrows():
start = [datetime.strptime(t,'%H:%M:%S') for t in inex]
end = [datetime.strptime(t,'%H:%M:%S') for t in index]
print(time_diff(s, e))
assert time_diff(s, e) == time_diff(s.time(), e.time())
my csv file is:-
date time
10/3/2018 6:00:00
10/3/2018 7:00:00
10/3/2018 8:00:00
10/3/2018 9:00:00
10/3/2018 10:00:00
10/3/2018 11:00:00
10/3/2018 12:00:00
10/3/2018 13:45:00
10/3/2018 15:00:00
10/3/2018 16:00:00
10/3/2018 17:00:00
10/3/2018 18:00:00
10/3/2018 19:00:00
10/3/2018 20:00:00
10/3/2018 21:30:00
10/4/2018 6:00:00
My expected output (time difference) is:-
time_diff
0
1
1
1
1
1
1
1:45
1:15
1
1
1
1
1
1:30
8:30
This is the output that I want to display by using this code. But I don't know how to iterate through rows to take time difference in between two time. My time difference display in hour.
Upvotes: 1
Views: 619
Reputation: 153460
IIUC:
from io import StringIO
txtFile = StringIO("""date time
10/3/2018 6:00:00
10/3/2018 7:00:00
10/3/2018 8:00:00
10/3/2018 9:00:00
10/3/2018 10:00:00
10/3/2018 11:00:00
10/3/2018 12:00:00
10/3/2018 13:45:00
10/3/2018 15:00:00
10/3/2018 16:00:00
10/3/2018 17:00:00
10/3/2018 18:00:00
10/3/2018 19:00:00
10/3/2018 20:00:00
10/3/2018 21:30:00
10/4/2018 6:00:00""")
df = pd.read_csv(txtFile, sep='\t')
pd.to_datetime(df['date'] + ' ' + df['time']).diff().fillna(0)
Output:
0 00:00:00
1 01:00:00
2 01:00:00
3 01:00:00
4 01:00:00
5 01:00:00
6 01:00:00
7 01:45:00
8 01:15:00
9 01:00:00
10 01:00:00
11 01:00:00
12 01:00:00
13 01:00:00
14 01:30:00
15 08:30:00
dtype: timedelta64[ns]
Upvotes: 1
Reputation: 12808
1) Read your csv (with header and tab-separated?) into a pandas dataframe:
import pandas as pd
df = pd.read_csv('your_file.csv', header=0, sep='\t')
2) If done correctly, you would now have a dataframe with a date column and a time column. Create a pandas datetime column out of these two:
df['date_time'] = pd.to_datetime(df['date'] + ' ' + df['time'])
3) Get the date_time of the row above with shift() and calculate the difference between the date_time value of this row and its row above:
df['time_diff'] = df['date_time'] - df['date_time'].shift()
4) The first value is a NaT (not a time value) since it has no cell above. Fill this value with a 0.
df['time_diff'].fillna(0, inplace=True)
Upvotes: 0