Reputation: 1352
I am working on some event data with date time. there is an ID for each event gsm_id column. For each event, there is a starting date and time matchdatetime column. there will be multiple events occuring and represented by the column eventdatetime column. There is also a column which record PreviousEventTimecolumn. How can i do create a new column to record the following: 1) first event for each "gsm_ID where matchdatetime column and PreviousEventTime column are the same 2) the last event for each gsm_id where "eventdatetime" is the latest.
gsm_id eventdatetime matchdatetime PreviousEventTime
2462794 8/11/2017 18:46 8/11/2017 18:45 8/11/2017 18:45
2462794 8/11/2017 18:49 8/11/2017 18:45 8/11/2017 18:46
2462794 8/11/2017 19:13 8/11/2017 18:45 8/11/2017 18:49
2462794 8/11/2017 19:31 8/11/2017 18:45 8/11/2017 19:13
2462794 8/11/2017 19:40 8/11/2017 18:45 8/11/2017 19:31
2462794 8/11/2017 20:07 8/11/2017 18:45 8/11/2017 19:40
2462794 8/11/2017 20:09 8/11/2017 18:45 8/11/2017 20:07
2462796 8/12/2017 14:23 8/12/2017 14:00 8/12/2017 14:00
2462796 8/12/2017 14:38 8/12/2017 14:00 8/12/2017 14:23
2462796 8/12/2017 14:42 8/12/2017 14:00 8/12/2017 14:38
2462796 8/12/2017 15:08 8/12/2017 14:00 8/12/2017 14:42
2462796 8/12/2017 15:27 8/12/2017 14:00 8/12/2017 15:08
2462795 8/12/2017 17:39 8/12/2017 16:30 8/12/2017 16:30
2462795 8/12/2017 17:44 8/12/2017 16:30 8/12/2017 17:39
Can any one advise how to work out? If you wish to load the file, please download the file below:
https://drive.google.com/open?id=1SWuxm2PtLSphH6lvivPioTlfLe2JhGH1
Upvotes: 1
Views: 174
Reputation: 115
You can utilize pandas groupby objects for both questions.
import pandas as pd
data = pd.read_csv('Data_test.csv')
First find the indices where matchdatetime and PreviousEventTime is equal and then group by "gsm_id". Finally aggregate by the "first" function:
first_equal_dates = df[df['PreviousEventTime'] == df['matchdatetime']].groupby('gsm_id')['eventdatetime'].first()
You can then merge/join this on the original dataframe.
Again just groupby "gsm_id", but use the "last" function instead:
import pandas as pd
data = pd.read_csv('Data_test.csv')
last_event_dates = data.groupby('gsm_id').agg({'eventdatetime': 'last'})
Again just merge/join this on the original dataframe if you want it as a column in the dataframe.
You can also just assign the results in an one-liner:
data.set_index('gsm_id').assign(question_1=first_equal_dates, mylast=last_event_dates).reset_index()
Upvotes: 1
Reputation: 16414
You can use:
In [203]: s1 = df[df['PreviousEventTime'] == df['matchdatetime']].groupby('gsm_id')['eventdatetime'].first()
In [204]: s1
Out[204]:
gsm_id
2462794 2017-08-11 18:46:00
2462795 2017-08-12 17:39:00
2462796 2017-08-12 14:23:00
Name: eventdatetime, dtype: datetime64[ns]
In [205]: s2 = df.groupby('gsm_id')['eventdatetime'].last()
In [206]: s2
Out[206]:
gsm_id
2462794 2017-08-11 20:09:00
2462795 2017-08-12 17:44:00
2462796 2017-08-12 15:27:00
Name: eventdatetime, dtype: datetime64[ns]
Then finally:
df.set_index('gsm_id').assign(myfirst=s1, mylast=s2).reset_index()
Upvotes: 1