alan watt
alan watt

Reputation: 79

Time difference in pandas columns

I have the following pandas dataset containing information about flights

enter image description here

YEAR                int64
MONTH               int64
DAY                 int64
DAY_OF_WEEK         int64
AIRLINE            object
FLIGHT_NUMBER       int64
TAIL_NUMBER        object
DEPARTURE_TIME     object
ARRIVAL_TIME       object
ELAPSED_TIME      float64
CANCELLED           int64
dtype: object

and I want to calculate the time difference in minutes between DEPARTURE_TIME and ARRIVAL_TIME columns. The two columns have 'datetime.time' dtypes and when I try to subtract them using

(f.DEPARTURE_TIME-f.ARRIVAL_TIME)

I get the following error

TypeError: unsupported operand type(s) for -: 'datetime.time' and 'datetime.time'

How can I get the time difference in minutes between the two columns? I want to use this difference to subtract from EPASED_TIME column in a later step so I want to be a float.

Edit:

The original data can be found here

The steps I followed to create the dataframe are:

import pandas as pd

# read dataset
flights = pd.read_csv('flights.csv',
                      dtype = {'DEPARTURE_TIME': object, 'ARRIVAL_TIME': object},
                      low_memory=False)
# keep only columns of interest
F = flights[['YEAR', 'MONTH', 'DAY', 
                           'DAY_OF_WEEK', 'AIRLINE',
                           'FLIGHT_NUMBER', 'TAIL_NUMBER',
                           'DEPARTURE_TIME', 'ARRIVAL_TIME', 
                           'ELAPSED_TIME', 'CANCELLED']]
# drop rows contain na 
f = F.dropna()
# format columns to datetime
f['DEPARTURE_TIME'] = pd.to_datetime(f['DEPARTURE_TIME'],format='%H%M', errors='coerce').dt.time

f['ARRIVAL_TIME'] = pd.to_datetime(f['ARRIVAL_TIME'], format='%H%M', errors = 'coerce').dt.time
# calculate difference
(f.DEPARTURE_TIME-f.ARRIVAL_TIME)

Upvotes: 2

Views: 638

Answers (1)

Corralien
Corralien

Reputation: 120559

Update

Try:

# After pd.to_datetime
departure = pd.to_timedelta(df['DEPARTURE_TIME'].astype(str))
arrival = pd.to_timedelta(df['ARRIVAL_TIME'].astype(str))
diff = departure - arrival

diff = (diff + pd.Timedelta(days=1) 
            * (diff < pd.Timedelta(days=0))).dt.total_seconds() / 60
print(diff)

# Output
0    1186.0
1     981.0
dtype: float64

Setup:

df = pd.DataFrame({'DEPARTURE_TIME': ['2354', '0002'],
                   'ARRIVAL_TIME': ['0408', '0741']})

Old answer

Each values of your columns DEPARTURE_TIME and ARRIVAL_TIME contains Timestamp instance but the dtype is object. You have to convert first your columns into datetime64 with pd.to_datetime()

Try:

f.DEPARTURE_TIME = pd.to_datetime(f.DEPARTURE_TIME)
f.ARRIVAL_TIME = pd.to_datetime(f.ARRIVAL_TIME)

(f.DEPARTURE_TIME-f.ARRIVAL_TIME)

Upvotes: 1

Related Questions