albert
albert

Reputation: 8593

Avoiding nested .apply()

I am dealing with a dataset consisting of different times for the same day/date. The different times represent the occurrence of a given event. In addition to the time, the date is given in another column (see data snippet below for more details).

For further data processing, I need to combine the different times with the date to get a full datetime timestamp. Luckily, I was able to achieve the desired output by implementing nested .apply() calls like this:

import io

import pandas as pd


DATA_STRING = """
date        event_1     event_2     event_3  
2019-12-16  14:01:00    14:27:00    14:47:00
2020-01-16  13:47:00    14:08:00    14:28:00
2020-01-20  12:02:00    12:23:00    12:42:00
"""

TIME_COLUMNS = ['event_1', 'event_2', 'event_3']


def combine_timestamp(row):
    date = row['date']
    times = row[TIME_COLUMNS]
    return times.apply(lambda t: pd.Timestamp.combine(date, t.time()))


file_like = io.StringIO(DATA_STRING)
df = pd.read_csv(file_like, sep='\s+')

df['date'] = pd.to_datetime(df['date'])
df[TIME_COLUMNS] = df[TIME_COLUMNS].apply(pd.to_datetime)
# --> timestamps with date set to today (not a problem as time is relevant only)

df[TIME_COLUMNS] = df.apply(combine_timestamp, axis='columns')

print(df)

Printing:

        date             event_1             event_2             event_3
0 2019-12-16 2019-12-16 14:01:00 2019-12-16 14:27:00 2019-12-16 14:47:00
1 2020-01-16 2020-01-16 13:47:00 2020-01-16 14:08:00 2020-01-16 14:28:00
2 2020-01-20 2020-01-20 12:02:00 2020-01-20 12:23:00 2020-01-20 12:42:00

However, I wondered whether there is a more elegant way to implement this and avoid these nested .apply()` calls.

Upvotes: 2

Views: 79

Answers (2)

anky
anky

Reputation: 75080

I can think of something like this: add the date with the event columns as string and then convert to datetime:

df = pd.read_csv(file_like, sep='\s+')
out = df.assign(**(df['date'].add(' ').to_numpy()[:,None] + df.filter(like='event')))
out = out.apply(pd.to_datetime)

print(out)

         date             event_1             event_2             event_3
0  2019-12-16 2019-12-16 14:01:00 2019-12-16 14:27:00 2019-12-16 14:47:00
1  2020-01-16 2020-01-16 13:47:00 2020-01-16 14:08:00 2020-01-16 14:28:00
2  2020-01-20 2020-01-20 12:02:00 2020-01-20 12:23:00 2020-01-20 12:42:00

Upvotes: 5

Quang Hoang
Quang Hoang

Reputation: 150745

Use pd.to_datetime that can take the whole columns:

(df.filter(like='event').add(df.date, axis=0)
   .apply(pd.to_datetime,format='%H:%M:%S%Y-%m-%d')
)

Another way is pd.to_timedelta for the time and pd.to_datetime for the date:

(df.filter(like='event')
   .apply(pd.to_timedelta)
   .add(pd.to_datetime(df.date), axis=0)
)

Output:

              event_1             event_2             event_3
0 2019-12-16 14:01:00 2019-12-16 14:27:00 2019-12-16 14:47:00
1 2020-01-16 13:47:00 2020-01-16 14:08:00 2020-01-16 14:28:00
2 2020-01-20 12:02:00 2020-01-20 12:23:00 2020-01-20 12:42:00

Upvotes: 4

Related Questions