learner
learner

Reputation: 194

Round up half of the hour in pandas

round() function in pandas rounds down the time 07:30 to 07:00 But I want to round up any time which passes the 30 minutes (inclusive).

Eg.

07:15 to 07:00
05:25 to 05:00
22:30 to 23:00
18:45 to 19:00

How to achieve this for a column of a dataframe using pandas?

Upvotes: 2

Views: 816

Answers (2)

mozway
mozway

Reputation: 260725

timestamps

You need to use dt.round. This is however a bit as the previous/next hour behavior depends on the hour itself. You can force it by adding or subtracting a small amount of time (here 1ns):

s = pd.to_datetime(pd.Series(['1/2/2021 3:45', '25/4/2021 12:30', 
                              '25/4/2021 13:30', '12/4/2022 23:45']))

# xx:30 -> rounding depending on the hour parity (default)
s.dt.round(freq='1h')

0   2021-01-02 04:00:00
1   2021-04-25 12:00:00    <- -30min
2   2021-04-25 14:00:00    <- +30min
3   2022-12-05 00:00:00
dtype: datetime64[ns]


# 00:30 -> 00:00 (force down)
s.sub(pd.Timedelta('1ns')).dt.round(freq='1h')

0   2021-01-02 04:00:00
1   2021-04-25 12:00:00
2   2021-04-25 13:00:00
3   2022-12-05 00:00:00
dtype: datetime64[ns]


# 00:30 -> 01:00 (force up)
s.add(pd.Timedelta('1ns')).dt.round(freq='1h')

0   2021-01-02 04:00:00
1   2021-04-25 12:00:00
2   2021-04-25 13:00:00
3   2022-12-05 00:00:00
dtype: datetime64[ns]

floats

IIUC, you can use divmod (or numpy.modf) to get the integer and decimal part, then perform simple boolean arithmetic:

s = pd.Series([7.15, 5.25, 22.30, 18.45])

s2, r = s.divmod(1)  # or np.modf(s)

s2[r.ge(0.3)] += 1

s2 = s2.astype(int)

Alternative: using mod and boolean to int equivalence:

s2 = s.astype(int)+s.mod(1).ge(0.3)

output:

0     7
1     5
2    23
3    19
dtype: int64

Note on precision. It is not always easy to compare floats due to floating point arithmetics. For instance using gt would fail on the 22.30 here. To ensure precision round to 2 digits first.

s.mod(1).round(2).ge(0.3)

or use integers:

s.mod(1).mul(100).astype(int).ge(30)

Upvotes: 2

warped
warped

Reputation: 9481

Here a version that works with timestamps:

#dummy data:
df = pd.DataFrame({'time':pd.to_datetime([np.random.randint(0,10**8) for a in range(10)], unit='s')})


def custom_round(df, col, out):
    if df[col].minute >= 30:
        df[out] = df[col].ceil('H')
    else:
        df[out] = df[col].floor('H')
    return df


df.apply(lambda x: custom_round(x, 'time', 'new_time'), axis=1)

#edit:

using numpy:

def custom_round(df, col, out):
    df[out] = np.where(
        (
            df['time'].dt.minute>=30), 
            df[col].dt.ceil('H'), 
            df[col].dt.floor('H')
    )
    return df
df = custom_round(df, 'time', 'new_time')

Upvotes: 1

Related Questions