Reputation: 89
I have a dataframe with several columns, I want to get the difference in time between two of the columns containing time. To start I have converted the two columns to DateTime objects using pd.to_datetime, but when I subtract the two columns and assign the result to the new column ends up with NaN values.
ops_data_clean_1.loc['Package committed-time'] =
pd.to_datetime(ops_data_clean_1['Package committed-time'])
ops_data_clean_1.loc['Flight launched-time'] =
pd.to_datetime(ops_data_clean_1['Flight launched-time'])
ops_data_clean_1['time_to_launch'] = ops_data_clean_1.loc['Flight
launched-time'] - ops_data_clean_1.loc['Package committed-time']
ops_data_clean_1.head()
Upvotes: 0
Views: 1666
Reputation: 11917
I think your problem is usage of loc
when you are accessing just a column from the dataframe. You can eliminate the problem just by removing loc
from the code.
See the following toy example,
ops_data_clean_1 = pd.DataFrame()
ops_data_clean_1['Package committed-time'] = ['2018-01-01 00:00:30', '2018-01-01 00:49:00', '2018-03-01 00:00:45']
ops_data_clean_1['Flight launched-time'] = ['2018-01-01 01:00:30', '2018-01-01 02:49:00', '2018-03-01 00:54:45']
ops_data_clean_1['Package committed-time'] = pd.to_datetime(ops_data_clean_1['Package committed-time'])
ops_data_clean_1['Flight launched-time'] = pd.to_datetime(ops_data_clean_1['Flight launched-time'])
ops_data_clean_1['time_to_launch'] = ops_data_clean_1['Flight launched-time'] - ops_data_clean_1['Package committed-time']
ops_data_clean_1.head()
# Output
Package committed-time Flight launched-time time_to_launch
0 2018-01-01 00:00:30 2018-01-01 01:00:30 01:00:00
1 2018-01-01 00:49:00 2018-01-01 02:49:00 02:00:00
2 2018-03-01 00:00:45 2018-03-01 00:54:45 00:54:00
If you want to use loc
you have to select all the rows of the dataframe using :
For example ops_data_clean_1.loc[:, 'Flight launched-time']
Then the code becomes,
ops_data_clean_1 = pd.DataFrame()
ops_data_clean_1['Package committed-time'] = ['2018-01-01 00:00:30', '2018-01-01 00:49:00', '2018-03-01 00:00:45']
ops_data_clean_1['Flight launched-time'] = ['2018-01-01 01:00:30', '2018-01-01 02:49:00', '2018-03-01 00:54:45']
ops_data_clean_1.loc[:, 'Package committed-time'] = pd.to_datetime(ops_data_clean_1['Package committed-time'])
ops_data_clean_1.loc[:, 'Flight launched-time'] = pd.to_datetime(ops_data_clean_1['Flight launched-time'])
ops_data_clean_1['time_to_launch'] = ops_data_clean_1.loc[:, 'Flight launched-time'] - ops_data_clean_1.loc[:, 'Package committed-time']
ops_data_clean_1.head()
# Output
Package committed-time Flight launched-time time_to_launch
0 2018-01-01 00:00:30 2018-01-01 01:00:30 01:00:00
1 2018-01-01 00:49:00 2018-01-01 02:49:00 02:00:00
2 2018-03-01 00:00:45 2018-03-01 00:54:45 00:54:00
Upvotes: 1
Reputation: 315
I think your trouble lies in .loc function you use.
.loc['Package committed-time'] basically says, select ROWs, which have value 'Package committed-time', which are none.
But you want to select column with that name. Use simple ops_data_clean_1['Package committed-time'] to access the column or ops_data_clean_1.loc[:,'Package committed-time']
More about .loc here: enter link description here
Upvotes: 1