Reputation: 159

Calculating time differences in a one to many dataframe

I recently downloaded my League of Legends Data. I have the following DF

df = pd.DataFrame.from_dict({'DateTime': {
    0: 156102273400,
    1: 156101627200,
    2: 156092208200,
    3: 1559897767000,
    4: 1559890046000,
    5: 1559889968000},
                      'EventType': {
    0: 'LOGOUT_USER',
    1: 'LOGIN',
    2: 'LOGOUT_USER',
    3: 'LOGIN',
    4: 'LOGIN',
    5: 'LOGIN'}})

I get the following df:

>>>df
Index    DateTime          EventType
0        156102273400      LOGOUT_USER
1        156101627200      LOGIN
2        156092208200      LOGOUT_USER
3        1559897767000     LOGIN
4        1559890046000     LOGIN
5        1559889968000     LOGIN

I want to map one single LOGOUT_USER to the minimum LOGIN before the next LOGOUT_USER is encountered. From there I should be able to calculate the total time played.

Ideal output would look as follows:

>>>fixed_df
Index    DateTime          EventType
0        156102273400      LOGOUT_USER
1        156101627200      LOGIN
2        156092208200      LOGOUT_USER
3        1559889968000     LOGIN

Upvotes: 3

Answers (4)

jxc

Reputation: 13998

You can also set up an extra group label g (plus EventType) and then drop_duplicates without running groupby:

df.assign(g=df['EventType'].eq('LOGOUT_USER').cumsum()) \
  .drop_duplicates(['g','EventType'], keep='last') \
  .drop('g', axis=1)

#        DateTime    EventType
#0   156102273400  LOGOUT_USER
#1   156101627200        LOGIN
#2   156092208200  LOGOUT_USER
#5  1559889968000        LOGIN

Upvotes: 0

cs95

Reputation: 402493

I think you're looking for groupby and idxmin.

grouper = df['EventType'].ne(df['EventType'].shift()).cumsum()
df.loc[df.groupby(grouper)['DateTime'].idxmin()]  

        DateTime    EventType
0   156102273400  LOGOUT_USER
1   156101627200        LOGIN
2   156092208200  LOGOUT_USER
5  1559889968000        LOGIN

Upvotes: 3

Quang Hoang

Reputation: 150745

Without groupby, you can combine your logic:

# logouts
log_out = df.eventType.eq('LOGOUT_USER')

# before login
next_log_in = df.eventType.shift(-1).eq('LOGIN')

# logout followed by login    
markers = log_out & next_log_in

# those logouts and logins after
df[markers | markers.shift()]

Output:

        dateTime    eventType
0   156102273400  LOGOUT_USER
1   156101627200        LOGIN
2   156092208200  LOGOUT_USER
3  1559897767000        LOGIN

Upvotes: 0

BENY

Reputation: 323236

You can do

df.groupby(df.eventType.eq('LOGOUT_USER').cumsum()).agg(['first','last'])\
    .stack(level=1).reset_index(drop=True)
Out[634]: 
        dateTime    eventType
0   156102273400  LOGOUT_USER
1   156101627200        LOGIN
2   156092208200  LOGOUT_USER
3  1559889968000        LOGIN

Upvotes: 3

Calculating time differences in a one to many dataframe

Answers (4)

Related Questions