Reputation: 5234
Let's say I have a python dataframe with a time related column called "Time". Inside this column there are strings that represent minutes and seconds. For example, the first row value 125:19 represents 125 minutes and 19 seconds. Its datatype is a string.
I want to convert this value to total minutes in a new column "Time_minutes". So 125:19 should become 125.316666666667 which should be a float datatype.
Along a similar vein if the value is 0:00 then the corresponding "Time_minutes" column should show 0 (float datatype).
I've done this in SQL using lambdas and index functions. But is there an easier/more straightforward way to do this in python?
Upvotes: 1
Views: 53
Reputation: 14424
Option 1
If performance is a concern and you are certain that each string ends with ":ss"
, you can slice Series.str
with [:-3]
and [-2:]
respectively, apply Series.astype
for conversion to float
and chain Series.div
for the second instance for division by 60.
import pandas as pd
data = {'Time': ['123:19','0:00','0:30']}
df = pd.DataFrame(data)
df['Time_minutes'] = (df['Time'].str[:-3].astype(float) +
df['Time'].str[-2:].astype(float).div(60))
df
Time Time_minutes
0 123:19 123.316667
1 0:00 0.000000
2 0:30 0.500000
This will be faster than any option with Series.split
.
Option 2
Alternatively, relying on Series.split
, you can set the expand
parameter to True
, which will return the result as a pd.DataFrame
. Now, you can divide by [1, 60]
, leaving the first column (i.e., the integer (or "minutes") part) unchanged through division by 1, and then apply df.sum
on axis=1
.
df['Time_minutes'] = (df['Time'].str.split(':', expand=True)
.astype(float).div([1, 60]).sum(axis=1))
Option 3
A slightly faster variation on "Option 2" would be to apply df.pipe
to the result of Series.split
with expand=True
and work with its column 0
and 1
inside a lambda function.
df['Time_minutes'] = (df['Time'].str.split(':', expand=True)
.pipe(lambda x: x[0].astype(float) +
x[1].astype(float).div(60)))
In both cases you would benefit from avoiding the need to create an intermediate variable, such as s
in the answer by @AndrejKesely
. Both options are also marginally faster.
Performance comparison
import timeit
mysetup = """
import pandas as pd
import numpy as np
np.random.seed(1)
data = {'Time': (np.random.rand(1_000)*100).round(2)}
df = pd.DataFrame(data)
df['Time'] = (df['Time'].apply(lambda x: "{:.2f}".format(x))
.str.replace('.',':', regex=False))
"""
func_dict = {'Option 1 (slice)': "df['Time'].str[:-3].astype(float) + df['Time'].str[-2:].astype(float).div(60)",
'Option 2 (expand)': "df['Time'].str.split(':', expand=True).astype(float).div([1, 60]).sum(axis=1)",
'Option 3 (expand-pipe)': "df['Time'].str.split(':', expand=True).pipe(lambda x: x[0].astype(float) + x[1].astype(float).div(60))",
'Option 4 (intermediate var)': '(s := df["Time"].str.split(":")).str[0].astype(float) + (s.str[1].astype(float) / 60)'}
for k, v in func_dict.items():
print(f"{k}: {timeit.timeit(setup=mysetup, stmt=v, number=1_000)}")
# in seconds
Option 1 (slice): 1.1033934000879526
Option 2 (expand): 1.5235498000402004
Option 3 (expand-pipe): 1.456193899968639
Option 4 (intermediate var): 1.8184985001571476
Upvotes: 0
Reputation: 195573
One of possible solution, use .str.split
:
df["Converted"] = (s := df["Time"].str.split(":")).str[0].astype(float) + (s.str[1].astype(float) / 60)
print(df)
Prints:
Time Converted
0 125:19 125.316667
1 0:00 0.000000
2 0:30 0.500000
Upvotes: 1