PineNuts0
PineNuts0

Reputation: 5234

Python: Pandas Dataframe -- Convert String Time Column in mm:ss Format to Total Minutes in Float Format

Let's say I have a python dataframe with a time related column called "Time". Inside this column there are strings that represent minutes and seconds. For example, the first row value 125:19 represents 125 minutes and 19 seconds. Its datatype is a string.

I want to convert this value to total minutes in a new column "Time_minutes". So 125:19 should become 125.316666666667 which should be a float datatype.

Along a similar vein if the value is 0:00 then the corresponding "Time_minutes" column should show 0 (float datatype).

I've done this in SQL using lambdas and index functions. But is there an easier/more straightforward way to do this in python?

Upvotes: 1

Views: 53

Answers (2)

ouroboros1
ouroboros1

Reputation: 14424

Option 1

If performance is a concern and you are certain that each string ends with ":ss", you can slice Series.str with [:-3] and [-2:] respectively, apply Series.astype for conversion to float and chain Series.div for the second instance for division by 60.

import pandas as pd

data = {'Time': ['123:19','0:00','0:30']}
df = pd.DataFrame(data)
                          
df['Time_minutes'] = (df['Time'].str[:-3].astype(float) +
                      df['Time'].str[-2:].astype(float).div(60))

df
     Time  Time_minutes
0  123:19    123.316667
1    0:00      0.000000
2    0:30      0.500000

This will be faster than any option with Series.split.

Option 2

Alternatively, relying on Series.split, you can set the expand parameter to True, which will return the result as a pd.DataFrame. Now, you can divide by [1, 60], leaving the first column (i.e., the integer (or "minutes") part) unchanged through division by 1, and then apply df.sum on axis=1.

df['Time_minutes'] = (df['Time'].str.split(':', expand=True)
                      .astype(float).div([1, 60]).sum(axis=1))

Option 3

A slightly faster variation on "Option 2" would be to apply df.pipe to the result of Series.split with expand=True and work with its column 0 and 1 inside a lambda function.

df['Time_minutes'] = (df['Time'].str.split(':', expand=True)
                      .pipe(lambda x: x[0].astype(float) + 
                            x[1].astype(float).div(60)))

In both cases you would benefit from avoiding the need to create an intermediate variable, such as s in the answer by @AndrejKesely. Both options are also marginally faster.

Performance comparison

import timeit

mysetup = """
import pandas as pd
import numpy as np

np.random.seed(1)

data = {'Time': (np.random.rand(1_000)*100).round(2)}
df = pd.DataFrame(data)
df['Time'] = (df['Time'].apply(lambda x: "{:.2f}".format(x))
              .str.replace('.',':', regex=False))
"""

func_dict = {'Option 1 (slice)': "df['Time'].str[:-3].astype(float) + df['Time'].str[-2:].astype(float).div(60)",
             'Option 2 (expand)': "df['Time'].str.split(':', expand=True).astype(float).div([1, 60]).sum(axis=1)",
             'Option 3 (expand-pipe)': "df['Time'].str.split(':', expand=True).pipe(lambda x: x[0].astype(float) + x[1].astype(float).div(60))",
             'Option 4 (intermediate var)': '(s := df["Time"].str.split(":")).str[0].astype(float) + (s.str[1].astype(float) / 60)'}

for k, v in func_dict.items():
    print(f"{k}: {timeit.timeit(setup=mysetup, stmt=v, number=1_000)}")

# in seconds
Option 1 (slice): 1.1033934000879526
Option 2 (expand): 1.5235498000402004
Option 3 (expand-pipe): 1.456193899968639
Option 4 (intermediate var): 1.8184985001571476

Upvotes: 0

Andrej Kesely
Andrej Kesely

Reputation: 195573

One of possible solution, use .str.split:

df["Converted"] = (s := df["Time"].str.split(":")).str[0].astype(float) + (s.str[1].astype(float) / 60)
print(df)

Prints:

     Time   Converted
0  125:19  125.316667
1    0:00    0.000000
2    0:30    0.500000

Upvotes: 1

Related Questions