Reputation: 79
I have the following pandas dataset containing information about flights
YEAR int64
MONTH int64
DAY int64
DAY_OF_WEEK int64
AIRLINE object
FLIGHT_NUMBER int64
TAIL_NUMBER object
DEPARTURE_TIME object
ARRIVAL_TIME object
ELAPSED_TIME float64
CANCELLED int64
dtype: object
and I want to calculate the time difference in minutes between DEPARTURE_TIME and ARRIVAL_TIME columns. The two columns have 'datetime.time' dtypes and when I try to subtract them using
(f.DEPARTURE_TIME-f.ARRIVAL_TIME)
I get the following error
TypeError: unsupported operand type(s) for -: 'datetime.time' and 'datetime.time'
How can I get the time difference in minutes between the two columns? I want to use this difference to subtract from EPASED_TIME column in a later step so I want to be a float.
Edit:
The original data can be found here
The steps I followed to create the dataframe are:
import pandas as pd
# read dataset
flights = pd.read_csv('flights.csv',
dtype = {'DEPARTURE_TIME': object, 'ARRIVAL_TIME': object},
low_memory=False)
# keep only columns of interest
F = flights[['YEAR', 'MONTH', 'DAY',
'DAY_OF_WEEK', 'AIRLINE',
'FLIGHT_NUMBER', 'TAIL_NUMBER',
'DEPARTURE_TIME', 'ARRIVAL_TIME',
'ELAPSED_TIME', 'CANCELLED']]
# drop rows contain na
f = F.dropna()
# format columns to datetime
f['DEPARTURE_TIME'] = pd.to_datetime(f['DEPARTURE_TIME'],format='%H%M', errors='coerce').dt.time
f['ARRIVAL_TIME'] = pd.to_datetime(f['ARRIVAL_TIME'], format='%H%M', errors = 'coerce').dt.time
# calculate difference
(f.DEPARTURE_TIME-f.ARRIVAL_TIME)
Upvotes: 2
Views: 638
Reputation: 120559
Update
Try:
# After pd.to_datetime
departure = pd.to_timedelta(df['DEPARTURE_TIME'].astype(str))
arrival = pd.to_timedelta(df['ARRIVAL_TIME'].astype(str))
diff = departure - arrival
diff = (diff + pd.Timedelta(days=1)
* (diff < pd.Timedelta(days=0))).dt.total_seconds() / 60
print(diff)
# Output
0 1186.0
1 981.0
dtype: float64
Setup:
df = pd.DataFrame({'DEPARTURE_TIME': ['2354', '0002'],
'ARRIVAL_TIME': ['0408', '0741']})
Old answer
Each values of your columns DEPARTURE_TIME
and ARRIVAL_TIME
contains Timestamp
instance but the dtype
is object
.
You have to convert first your columns into datetime64
with pd.to_datetime()
Try:
f.DEPARTURE_TIME = pd.to_datetime(f.DEPARTURE_TIME)
f.ARRIVAL_TIME = pd.to_datetime(f.ARRIVAL_TIME)
(f.DEPARTURE_TIME-f.ARRIVAL_TIME)
Upvotes: 1