Reputation: 77
I have a string type variable, which is actually a time. The series contain some None values.
d = {'col1': [1,2,3,4,5,6], 'time': ['07:00:00','07:30:00','07:00:00',None,'08:00:00','09:00:00']}
data = pd.DataFrame(data=d)
I would like to create a new column and extract the hour and minute values (not seconds). Expected output is:
'new_col ': ['07:00','07:30','07:00',None,'08:00','09:00']
#instead of None I could also have NaT
The following two do not work..
data['new_col']= data['time'].dt.hour
data['new_col']= np.where(data['time'].notna(),data['time'].hour,None)
AttributeError: 'Series' object has no attribute 'hour'
data['new_col']= np.where(data['time'].notna(),data['time'].apply(lambda x: dt.strptime(x, '%H:%M'),None))
this gives me:
ValueError: unconverted data remains: :00
Upvotes: 2
Views: 2676
Reputation: 862851
Here seems after dt.strftime('%H:%M')
all working nice:
data['new_col'] = pd.to_datetime(data['time']).dt.strftime('%H:%M')
print (data)
col1 time new_col
0 1 07:00:00 07:00
1 2 07:30:00 07:30
2 3 07:00:00 07:00
3 4 None NaT
4 5 08:00:00 08:00
5 6 09:00:00 09:00
But if test values, there is no None
, no missing value, only string NaT
:
print (data['new_col'].tolist())
['07:00', '07:30', '07:00', 'NaT', '08:00', '09:00']
So necesary replace NaT
to NaN
or None
:
data['new_col'] = pd.to_datetime(data['time']).dt.strftime('%H:%M').replace('NaT', np.nan)
print (data['new_col'].tolist())
['07:00', '07:30', '07:00', nan, '08:00', '09:00']
data['new_col'] = (pd.to_datetime(data['time'])
.dt.strftime('%H:%M')
.mask(lambda x: x=='NaT', None))
Or:
data['new_col']= np.where(data['time'].notna(),
pd.to_datetime(data['time']).dt.strftime('%H:%M'),
None)
print (data)
col1 time new_col
0 1 07:00:00 07:00
1 2 07:30:00 07:30
2 3 07:00:00 07:00
3 4 None None
4 5 08:00:00 08:00
5 6 09:00:00 09:00
print (data['new_col'].tolist())
['07:00', '07:30', '07:00', None, '08:00', '09:00']
Alternative:
data['new_col'] = data['time'].str.rsplit(':', n=1).str[0]
print (data)
col1 time new_col
0 1 07:00:00 07:00
1 2 07:30:00 07:30
2 3 07:00:00 07:00
3 4 None None
4 5 08:00:00 08:00
5 6 09:00:00 09:00
print (data['new_col'].tolist())
['07:00', '07:30', '07:00', None, '08:00', '09:00']
Upvotes: 1
Reputation: 5389
I think that
data['new_col'] = pd.to_datetime(data['time']).dt.strftime('%H:%M')
gives the output you want (or close)?
First, convert data.time
to the datetime64[ns]
type with pd.to_datetime
. That allows you to use the .dt
accessor to perform various datetime related operations.
Upvotes: 1