Lysis90
Lysis90

Reputation: 77

convert to datetime with None

I have a string type variable, which is actually a time. The series contain some None values.

d = {'col1': [1,2,3,4,5,6], 'time': ['07:00:00','07:30:00','07:00:00',None,'08:00:00','09:00:00']}
data = pd.DataFrame(data=d)

I would like to create a new column and extract the hour and minute values (not seconds). Expected output is:

'new_col ': ['07:00','07:30','07:00',None,'08:00','09:00']
#instead of None I could also have NaT

The following two do not work..

data['new_col']= data['time'].dt.hour
data['new_col']= np.where(data['time'].notna(),data['time'].hour,None)

AttributeError: 'Series' object has no attribute 'hour'

data['new_col']= np.where(data['time'].notna(),data['time'].apply(lambda x: dt.strptime(x, '%H:%M'),None))

this gives me:

ValueError: unconverted data remains: :00

Upvotes: 2

Views: 2676

Answers (2)

jezrael
jezrael

Reputation: 862851

Here seems after dt.strftime('%H:%M') all working nice:

data['new_col'] = pd.to_datetime(data['time']).dt.strftime('%H:%M')
print (data)
   col1      time new_col
0     1  07:00:00   07:00
1     2  07:30:00   07:30
2     3  07:00:00   07:00
3     4      None     NaT
4     5  08:00:00   08:00
5     6  09:00:00   09:00

But if test values, there is no None, no missing value, only string NaT:

print (data['new_col'].tolist())
['07:00', '07:30', '07:00', 'NaT', '08:00', '09:00']

So necesary replace NaT to NaN or None:

data['new_col'] = pd.to_datetime(data['time']).dt.strftime('%H:%M').replace('NaT', np.nan)
print (data['new_col'].tolist())
['07:00', '07:30', '07:00', nan, '08:00', '09:00']

data['new_col'] = (pd.to_datetime(data['time'])
                     .dt.strftime('%H:%M')
                     .mask(lambda x: x=='NaT', None))

Or:

data['new_col']= np.where(data['time'].notna(),
                          pd.to_datetime(data['time']).dt.strftime('%H:%M'),
                          None)

print (data)
   col1      time new_col
0     1  07:00:00   07:00
1     2  07:30:00   07:30
2     3  07:00:00   07:00
3     4      None    None
4     5  08:00:00   08:00
5     6  09:00:00   09:00

print (data['new_col'].tolist())
['07:00', '07:30', '07:00', None, '08:00', '09:00']

Alternative:

data['new_col'] = data['time'].str.rsplit(':', n=1).str[0]
print (data)
   col1      time new_col
0     1  07:00:00   07:00
1     2  07:30:00   07:30
2     3  07:00:00   07:00
3     4      None    None
4     5  08:00:00   08:00
5     6  09:00:00   09:00

print (data['new_col'].tolist())
['07:00', '07:30', '07:00', None, '08:00', '09:00']

Upvotes: 1

tomjn
tomjn

Reputation: 5389

I think that

data['new_col'] = pd.to_datetime(data['time']).dt.strftime('%H:%M')

gives the output you want (or close)?

First, convert data.time to the datetime64[ns] type with pd.to_datetime. That allows you to use the .dt accessor to perform various datetime related operations.

Upvotes: 1

Related Questions