NAB0815
NAB0815

Reputation: 471

Conditional datetime column filling in pandas

I have a DateTime column in pandas dataframe and I want to create a new column which conditionally fills based on the datetime_column value.

Example of my dataframe:

datetime_column
2018-09-19 17:00:03
NaT
NaT
NaT
2018-09-20 07:00:30
NaT
NaT
NaT
2018-09-21 13:00:30
NaT
NaT
NaT
Nat
2018-09-22 20:00:30
NaT
NaT
NaT
NaT
2018-09-22 04:00:30

If the value of datetime_column is in between the time 00:00:00 and 09:00:00, it should fill it with the previous value else it should fill in with the same value as the datetime_column value

Expected dataframe:

datetime_column         datetime_column_x
2018-09-19 17:00:03     2018-09-19 17:00:03
NaT
NaT
NaT
2018-09-20 07:00:30     2018-09-19 17:00:03
NaT
NaT
NaT
2018-09-21 13:00:30     2018-09-21 13:00:30
NaT 
NaT
NaT
Nat
2018-09-22 20:00:30     2018-09-22 20:00:30
NaT
NaT
NaT
NaT
2018-09-22 04:00:30     2018-09-22 20:00:30

Tried code:

start_time : '00:00:00'
end_time :   '12:00:00'
df['datetime_column_x'] = np.where((df['datetime'] >= start_time & 
                          df['datetime']<= end_time), df['datetime']-1, 
                          df['datetime'])

But the above code gives me an error:

TypeError: cannot compare a dtyped [datetime64[ns]] array with a scalar of type [bool]

I'd really appreciate if I can get some help with this.

Upvotes: 2

Views: 2438

Answers (2)

Dagorodir
Dagorodir

Reputation: 104

Wen-Ben's answer is comprehensive - but make sure your dates are in the right format to compare to python's datetime library. This won't work with numpy datetime64 objects.

Convert your the dates to datetime or pandas Timestamp objects:

df['datetime'] = pd.to_datetime(df.loc[:, 'datetime'])
# or
df['datetime'] = df.loc[:, 'datetime'].apply(pd.Timestamp)

Then do as Wen-Ben says, comparing the datetime.time attributes.

Upvotes: 0

BENY
BENY

Reputation: 323236

You should using shift here

s=df['datetime_column'].ffill()
df['datetime_column_x']=np.where(df.datetime_column.dt.hour.between(0,9),s.shift(),df['datetime_column'])
df
Out[441]: 
       datetime_column   datetime_column_x
0  2018-09-19 17:00:03 2018-09-19 17:00:03
1                  NaT                 NaT
2                  NaT                 NaT
3                  NaT                 NaT
4  2018-09-20 07:00:30 2018-09-19 17:00:03
5                  NaT                 NaT
6                  NaT                 NaT
7                  NaT                 NaT
8  2018-09-21 13:00:30 2018-09-21 13:00:30
9                  NaT                 NaT
10                 NaT                 NaT
11                 NaT                 NaT
12                 NaT                 NaT
13 2018-09-22 20:00:30 2018-09-22 20:00:30
14                 NaT                 NaT
15                 NaT                 NaT
16                 NaT                 NaT
17                 NaT                 NaT
18 2018-09-22 04:00:30 2018-09-22 20:00:30

Upvotes: 1

Related Questions