Python_Learner
Python_Learner

Reputation: 19

Difference of Timestamp Rows in Pandas where Difference condition will be updated everytime

I have a Sample DataFrame which has 2 Columns ID, Datetime.

ID          Datetime            
123         12Sep2021 10:00       
123         12Sep2021 10:10 
123         12Sep2021 10:25
123         12Sep2021 10:40
123         12Sep2021 10:52
123         12Sep2021 11:20
456         01Oct2021 09:00
456         01Oct2021 09:10
456         01Oct2021 09:40

I want to create a New Variable Count which will be updated based on Datetime difference as follows:

For first transaction of an ID, Count will be 1. It will keep on incrementing till Difference is less than 30 minutes. If Difference is greater than 30 minutes, Count will again set to 1 and for next transactions of this ID, difference will be taken from ID which has Count 1.

The resultant Output will be as follows:

ID          Datetime              Count      
123         12Sep2021 10:00        1
123         12Sep2021 10:10        2
123         12Sep2021 10:25        3
123         12Sep2021 10:40        1
123         12Sep2021 10:52        2 
123         12Sep2021 11:20        1
456         01Oct2021 09:00        1
456         01Oct2021 09:10        2
456         01Oct2021 09:40        1

I have tried the following Pandas code : df1 is the Input DataFrame

df2=df1['Datetime']df1.groupby('ID')['Datetime'].transform('first')

The problem is: how to reset the datetime value everytime when difference > 30?

Upvotes: 1

Views: 159

Answers (1)

user16386186
user16386186

Reputation:

My solution is this. I am not sure if there is a simple way of doing this.

d={i:j.Datetime.to_numpy() for i,j in  df.groupby("ID")}

di=dict()

for id in d.keys():
    n=1
    times=d[id]
    empty_list=list()
    first=times[0]
    for time in times:
        diff=time-first
        if diff >= np.timedelta64(30, 'm'):
            first=time
            n=1
        empty_list.append(n)
        di.update({id:empty_list})
        n+=1

df["Count"] = df.groupby('ID').apply(lambda g:pd.Series(di[g.name])).to_list()

Upvotes: 1

Related Questions