Reputation: 13
I am trying to update selected datetime64 values in a pandas data frame using the loc method to select rows satisfying a condition. However, instead of assigning the new date-time value it results in NaT.
Here is a simplification of my code that shows the problem:
import pandas as pd
import numpy as np
datetime = (np.datetime64('1899-12-30'), np.datetime64('1989-12-30'), np.datetime64('2199-12-30'))
select = (0, 1, 0)
df = pd.DataFrame(list(zip(datetime, select)),
columns=['date_time', 'select'])
# create a new column by subtracting 180 days
df['new_date'] = df['date_time'] - pd.Timedelta(180, unit='d')
# replace datetime with new date where select is true
df.loc[(df['select'] == 1), ['date_time']] = df.loc[(df['select'] == 1), ['new_date']]
print(df)
# the second element of the date_time column is "NaT", but this is not the desired outcome.
# the desired behaviour is for it to be the same as the second element in the new_date column.
I've also put the code here: http://tpcg.io/vOoi87Gb
Any ideas on how this should be done or why this is not working as intended?
Thanks for reading.
Upvotes: 1
Views: 238
Reputation: 150805
You should drop []
around the column name:
df.loc[(df['select'] == 1), 'date_time'] = df.loc[(df['select'] == 1), 'new_date']
You can also drop the second boolean index:
df.loc[(df['select'] == 1), 'date_time'] = df['new_date']
Also, np.where
:
df['date_time'] = np.where(df['select']==1, df['new_date'], df['date_time'])
Explanation: df.loc[s, ['col_name']]
slices a dataframe, while df.loc[s, 'col_name']
slices a series. When you do:
dataframe_slice = another_dataframe_slice
Pandas will try to align the index/columns of the two dataframe. In this case, the two slices have no common columns, so the updated dataframe has NaN
values where select==1
.
Upvotes: 1