user10270654
user10270654

Reputation:

How to iterate over rows in time column to calculate the time difference using panda python

Here I want to calculate time interval in between rows in time column import from csv file. In my csv file it include date and time. Here I want to display time difference in between times in rows. That is my expected output. My code is:-

def time_diff(start, end):
    start.append(pd.to_datetime(data['time'][0],formt = '%H:%M:%S').time())
    end.append(pd.to_datetime(len(data['time']), format='%H:%M:%S').time())
    if isinstance(start, datetime_time): # convert to datetime
        assert isinstance(end, datetime_time)
        start, end = [datetime.combine(datetime.min,i) for t in [start, end]]
    if start <= end: 
        return end - start
    else:
        end += timedelta(1) # +day
        assert end > start
        return end - start
for index, row in data.iterrows():
    start = [datetime.strptime(t,'%H:%M:%S') for t in inex]
    end = [datetime.strptime(t,'%H:%M:%S') for t in index]
    print(time_diff(s, e))
    assert time_diff(s, e) == time_diff(s.time(), e.time())

my csv file is:-

date	time
10/3/2018	6:00:00
10/3/2018	7:00:00
10/3/2018	8:00:00
10/3/2018	9:00:00
10/3/2018	10:00:00
10/3/2018	11:00:00
10/3/2018	12:00:00
10/3/2018	13:45:00
10/3/2018	15:00:00
10/3/2018	16:00:00
10/3/2018	17:00:00
10/3/2018	18:00:00
10/3/2018	19:00:00
10/3/2018	20:00:00
10/3/2018	21:30:00
10/4/2018	6:00:00

My expected output (time difference) is:-

time_diff
0
1
1
1
1
1
1
1:45
1:15
1
1
1
1
1
1:30
8:30

This is the output that I want to display by using this code. But I don't know how to iterate through rows to take time difference in between two time. My time difference display in hour.

Upvotes: 1

Views: 619

Answers (2)

Scott Boston
Scott Boston

Reputation: 153460

IIUC:

from io import StringIO
txtFile = StringIO("""date  time
10/3/2018   6:00:00
10/3/2018   7:00:00
10/3/2018   8:00:00
10/3/2018   9:00:00
10/3/2018   10:00:00
10/3/2018   11:00:00
10/3/2018   12:00:00
10/3/2018   13:45:00
10/3/2018   15:00:00
10/3/2018   16:00:00
10/3/2018   17:00:00
10/3/2018   18:00:00
10/3/2018   19:00:00
10/3/2018   20:00:00
10/3/2018   21:30:00
10/4/2018   6:00:00""")

df = pd.read_csv(txtFile, sep='\t')

pd.to_datetime(df['date'] + ' ' + df['time']).diff().fillna(0)

Output:

0    00:00:00
1    01:00:00
2    01:00:00
3    01:00:00
4    01:00:00
5    01:00:00
6    01:00:00
7    01:45:00
8    01:15:00
9    01:00:00
10   01:00:00
11   01:00:00
12   01:00:00
13   01:00:00
14   01:30:00
15   08:30:00
dtype: timedelta64[ns]

Upvotes: 1

Sander van den Oord
Sander van den Oord

Reputation: 12808

1) Read your csv (with header and tab-separated?) into a pandas dataframe:

import pandas as pd

df = pd.read_csv('your_file.csv', header=0, sep='\t')

2) If done correctly, you would now have a dataframe with a date column and a time column. Create a pandas datetime column out of these two:

df['date_time'] = pd.to_datetime(df['date'] + ' ' + df['time'])

3) Get the date_time of the row above with shift() and calculate the difference between the date_time value of this row and its row above:

df['time_diff'] = df['date_time'] - df['date_time'].shift()

4) The first value is a NaT (not a time value) since it has no cell above. Fill this value with a 0.

df['time_diff'].fillna(0, inplace=True)

Upvotes: 0

Related Questions