Reputation: 19
I have a Sample DataFrame
which has 2 Columns ID, Datetime.
ID Datetime
123 12Sep2021 10:00
123 12Sep2021 10:10
123 12Sep2021 10:25
123 12Sep2021 10:40
123 12Sep2021 10:52
123 12Sep2021 11:20
456 01Oct2021 09:00
456 01Oct2021 09:10
456 01Oct2021 09:40
I want to create a New Variable Count which will be updated based on Datetime difference as follows:
For first transaction of an
ID
,Count
will be 1. It will keep on incrementing till Difference is less than 30 minutes. If Difference is greater than 30 minutes,Count
will again set to 1 and for next transactions of thisID
, difference will be taken fromID
which hasCount
1.
The resultant Output will be as follows:
ID Datetime Count
123 12Sep2021 10:00 1
123 12Sep2021 10:10 2
123 12Sep2021 10:25 3
123 12Sep2021 10:40 1
123 12Sep2021 10:52 2
123 12Sep2021 11:20 1
456 01Oct2021 09:00 1
456 01Oct2021 09:10 2
456 01Oct2021 09:40 1
I have tried the following Pandas code :
df1
is the Input DataFrame
df2=df1['Datetime']df1.groupby('ID')['Datetime'].transform('first')
The problem is: how to reset the datetime value everytime when difference > 30
?
Upvotes: 1
Views: 159
Reputation:
My solution is this. I am not sure if there is a simple way of doing this.
d={i:j.Datetime.to_numpy() for i,j in df.groupby("ID")}
di=dict()
for id in d.keys():
n=1
times=d[id]
empty_list=list()
first=times[0]
for time in times:
diff=time-first
if diff >= np.timedelta64(30, 'm'):
first=time
n=1
empty_list.append(n)
di.update({id:empty_list})
n+=1
df["Count"] = df.groupby('ID').apply(lambda g:pd.Series(di[g.name])).to_list()
Upvotes: 1