user1502668
user1502668

Reputation: 25

Count consecutive rows with minutes difference in dataframe

5I have a dataframe that looks like this:

Name Site Time
Manual BCN 3/10/2022 11:23:13 PM
Manual BCN 3/10/2022 11:38:47 PM
Automatic Madrid 3/10/2022 11:40:32 PM
Manual BCN 3/10/2022 11:39:47 PM
Manual BCN 3/11/2022 12:44:47 AM

It consists of a Name column, Place and Time. What I'm looking for is to count where Name and place are equal and Time is less than 20minutes between instances. In this case output would be Manual,bcn1 ->3 times as the 5th row is an hour away from the other two. The data is sorted by Time.

What I have tried is to groupby with the Name and Place and then apply a diff to Time with no avail.

df['Time'] = pd.to_datetime(df['Time'])
g=( df.groupby(['site','Name'])['Time'].diff().ne(pd.Timedelta(minutes=20))
      .groupby(df['site','Ppath']).cumsum() )
groups = df.groupby(['Site',g])['Time']
new_df = df.assign(count = groups.transform('size'))

This is returning the count of all values not the ones that fulfill the timedelta. The file itself is quite big with multiple Name and site places.

Many thanks

Edit1. To clarify I'm looking at value pairs so in this case the first row with the second one. And then the second one with the third one and so on. I'm exploring a solution with a For filtering by Name and site.

Thanks

Upvotes: 0

Views: 119

Answers (1)

not_speshal
not_speshal

Reputation: 23146

IIUC, try:

df["Time"] = pd.to_datetime(df["Time"])
df = df.sort_values("Time", ignore_index=True)

output = (df.groupby(["Name", "Site"])["Time"].apply(lambda x: x.diff()
                                                                .dt
                                                                .total_seconds()
                                                                .div(60)
                                                                .fillna(0)
                                                                .le(20)
                                                                .sum()
                                                    )
          )

>>> output
Name       Site  
Automatic  Madrid    1
Manual     BCN       3
Name: Time, dtype: int64

Upvotes: 1

Related Questions