db2020
db2020

Reputation: 135

Pandas datetime - keep time only as dtype datetime

I want the time without the date in Pandas. I want to keep the time as dtype datetime64[ns] and not as an object so that I can determine periods between times.

The closest I have gotten is as follows, but it gives back the date in a new column not the time as needed as dtype datetime.

df_pres_mf['time'] = pd.to_datetime(df_pres_mf['time'], format ='%H:%M', errors = 'coerce')   # returns date (1900-01-01) and actual time as a dtype datetime64[ns] format

df_pres_mf['just_time'] = df_pres_mf['time'].dt.date
df_pres_mf['normalised_time'] = df_pres_mf['time'].dt.normalize()
df_pres_mf.head()

Returns the date as 1900-01-01 and not the time that is needed.

Edit: Data

               time
1900-01-01 11:16:00
1900-01-01 15:20:00
1900-01-01 09:55:00
1900-01-01 12:01:00

Upvotes: 3

Views: 2482

Answers (2)

Stryder
Stryder

Reputation: 880

You could do it like Vishnudev suggested but then you would have dtype: object (or even strings, after using dt.strftime), which you said you didn't want.

What you are looking for doesn't exist, but the closest thing that I can get you is converting to timedeltas. Which won't seem like a solution at first but is actually very useful.

Convert it like this:

# sample df
df
>>
                 time
0 2021-02-07 09:22:00
1 2021-05-10 19:45:00
2 2021-01-14 06:53:00
3 2021-05-27 13:42:00
4 2021-01-18 17:28:00

df["timed"] = df.time - df.time.dt.normalize() 
df
>>
 
                 time           timed
0 2021-02-07 09:22:00 0 days 09:22:00  # this is just the time difference
1 2021-05-10 19:45:00 0 days 19:45:00  # since midnight, which is essentially the 
2 2021-01-14 06:53:00 0 days 06:53:00  # same thing as regular time, except
3 2021-05-27 13:42:00 0 days 13:42:00  # that you can go over 24 hours
4 2021-01-18 17:28:00 0 days 17:28:00

this allows you to calculate periods between times like this:

# subtract the last time from the current
df["difference"] = df.timed - df.timed.shift() 
df
Out[48]: 
                 time           timed        difference
0 2021-02-07 09:22:00 0 days 09:22:00               NaT
1 2021-05-10 19:45:00 0 days 19:45:00   0 days 10:23:00
2 2021-01-14 06:53:00 0 days 06:53:00 -1 days +11:08:00  # <-- this is because the last
3 2021-05-27 13:42:00 0 days 13:42:00   0 days 06:49:00  # time was later than the current
4 2021-01-18 17:28:00 0 days 17:28:00   0 days 03:46:00  # (see below)

to get rid of odd differences, make it absolute:

df["abs_difference"] = df.difference.abs()
df
>>
                 time           timed        difference  abs_difference
0 2021-02-07 09:22:00 0 days 09:22:00               NaT             NaT
1 2021-05-10 19:45:00 0 days 19:45:00   0 days 10:23:00 0 days 10:23:00
2 2021-01-14 06:53:00 0 days 06:53:00 -1 days +11:08:00 0 days 12:52:00  ### <<--
3 2021-05-27 13:42:00 0 days 13:42:00   0 days 06:49:00 0 days 06:49:00
4 2021-01-18 17:28:00 0 days 17:28:00   0 days 03:46:00 0 days 03:46:00

Upvotes: 1

Vishnudev Krishnadas
Vishnudev Krishnadas

Reputation: 10970

Use proper formatting according to your date format and convert to datetime

df['time'] = pd.to_datetime(df['time'], format='%Y-%m-%d %H:%M:%S')

Format according to the preferred format

df['time'].dt.strftime('%H:%M')

Output

0    11:16
1    15:20
2    09:55
3    12:01
Name: time, dtype: object

Upvotes: 1

Related Questions