Nickson Ndangalasi
Nickson Ndangalasi

Reputation: 89

I am getting NaN when I subtract two pandas dataframe columns

I have a dataframe with several columns, I want to get the difference in time between two of the columns containing time. To start I have converted the two columns to DateTime objects using pd.to_datetime, but when I subtract the two columns and assign the result to the new column ends up with NaN values.

ops_data_clean_1.loc['Package committed-time'] = 
pd.to_datetime(ops_data_clean_1['Package committed-time'])
ops_data_clean_1.loc['Flight launched-time'] = 
pd.to_datetime(ops_data_clean_1['Flight launched-time'])
ops_data_clean_1['time_to_launch'] = ops_data_clean_1.loc['Flight 
launched-time'] - ops_data_clean_1.loc['Package committed-time']
ops_data_clean_1.head()

Upvotes: 0

Views: 1666

Answers (2)

Sreeram TP
Sreeram TP

Reputation: 11917

I think your problem is usage of loc when you are accessing just a column from the dataframe. You can eliminate the problem just by removing loc from the code.

See the following toy example,

ops_data_clean_1 = pd.DataFrame()

ops_data_clean_1['Package committed-time'] = ['2018-01-01 00:00:30', '2018-01-01 00:49:00', '2018-03-01 00:00:45']
ops_data_clean_1['Flight launched-time'] = ['2018-01-01 01:00:30', '2018-01-01 02:49:00', '2018-03-01 00:54:45']

ops_data_clean_1['Package committed-time'] = pd.to_datetime(ops_data_clean_1['Package committed-time'])
ops_data_clean_1['Flight launched-time'] = pd.to_datetime(ops_data_clean_1['Flight launched-time'])

ops_data_clean_1['time_to_launch'] = ops_data_clean_1['Flight launched-time'] - ops_data_clean_1['Package committed-time']

ops_data_clean_1.head()

# Output

Package committed-time  Flight launched-time    time_to_launch
0   2018-01-01 00:00:30 2018-01-01 01:00:30 01:00:00
1   2018-01-01 00:49:00 2018-01-01 02:49:00 02:00:00
2   2018-03-01 00:00:45 2018-03-01 00:54:45 00:54:00

If you want to use loc you have to select all the rows of the dataframe using : For example ops_data_clean_1.loc[:, 'Flight launched-time']

Then the code becomes,

ops_data_clean_1 = pd.DataFrame()

ops_data_clean_1['Package committed-time'] = ['2018-01-01 00:00:30', '2018-01-01 00:49:00', '2018-03-01 00:00:45']
ops_data_clean_1['Flight launched-time'] = ['2018-01-01 01:00:30', '2018-01-01 02:49:00', '2018-03-01 00:54:45']

ops_data_clean_1.loc[:, 'Package committed-time'] = pd.to_datetime(ops_data_clean_1['Package committed-time'])
ops_data_clean_1.loc[:, 'Flight launched-time'] = pd.to_datetime(ops_data_clean_1['Flight launched-time'])

ops_data_clean_1['time_to_launch'] = ops_data_clean_1.loc[:, 'Flight launched-time'] - ops_data_clean_1.loc[:, 'Package committed-time']

ops_data_clean_1.head()

# Output

    Package committed-time  Flight launched-time    time_to_launch
0   2018-01-01 00:00:30 2018-01-01 01:00:30 01:00:00
1   2018-01-01 00:49:00 2018-01-01 02:49:00 02:00:00
2   2018-03-01 00:00:45 2018-03-01 00:54:45 00:54:00

Upvotes: 1

Pavel Klammert
Pavel Klammert

Reputation: 315

I think your trouble lies in .loc function you use.

.loc['Package committed-time'] basically says, select ROWs, which have value 'Package committed-time', which are none.

But you want to select column with that name. Use simple ops_data_clean_1['Package committed-time'] to access the column or ops_data_clean_1.loc[:,'Package committed-time']

More about .loc here: enter link description here

Upvotes: 1

Related Questions