How to update selected datetime64 values in a pandas dataframe?

Question

I am trying to update selected datetime64 values in a pandas data frame using the loc method to select rows satisfying a condition. However, instead of assigning the new date-time value it results in NaT.

Here is a simplification of my code that shows the problem:

import pandas as pd
import numpy as np

datetime = (np.datetime64('1899-12-30'), np.datetime64('1989-12-30'), np.datetime64('2199-12-30'))

select = (0, 1, 0)

df = pd.DataFrame(list(zip(datetime, select)),
                  columns=['date_time', 'select'])

# create a new column by subtracting 180 days
df['new_date'] = df['date_time'] - pd.Timedelta(180, unit='d')

# replace datetime with new date where select is true
df.loc[(df['select'] == 1), ['date_time']] = df.loc[(df['select'] == 1), ['new_date']]

print(df)
# the second element of the date_time column is "NaT", but this is not the desired outcome.
# the desired behaviour is for it to be the same as the second element in the new_date column.

I've also put the code here: http://tpcg.io/vOoi87Gb

Any ideas on how this should be done or why this is not working as intended?

Thanks for reading.

Quang Hoang · Accepted Answer

You should drop [] around the column name:

df.loc[(df['select'] == 1), 'date_time'] = df.loc[(df['select'] == 1), 'new_date']

You can also drop the second boolean index:

df.loc[(df['select'] == 1), 'date_time'] = df['new_date']

Also, np.where:

df['date_time'] = np.where(df['select']==1, df['new_date'], df['date_time'])

Explanation: df.loc[s, ['col_name']] slices a dataframe, while df.loc[s, 'col_name'] slices a series. When you do:

dataframe_slice = another_dataframe_slice

Pandas will try to align the index/columns of the two dataframe. In this case, the two slices have no common columns, so the updated dataframe has NaN values where select==1.

How to update selected datetime64 values in a pandas dataframe?

Answers (1)

Related Questions