Athena
Athena

Reputation: 185

pandas dataframe group by next occurance of column value

Below is my dataframe

     info        date       time      file            msg
0   INFO:  2018-09-12  16:10:10:  view.py:          phone   
1   INFO:  2018-09-12  16:10:10:  view.py:         asdasd   
2   INFO:  2018-09-12  16:10:43:  view.py:  contact start   
3   INFO:  2018-09-12  16:10:43:  view.py:    contact end   
4   INFO:  2018-09-12  16:11:36:  view.py:      app start   
5   INFO:  2018-09-12  16:11:36:  view.py:     busy start   
6   INFO:  2018-09-12  16:12:08:  view.py:       busy end   
7   INFO:  2018-09-12  16:12:08:  view.py:    contact end   
8   INFO:  2018-09-12  16:12:08:  view.py:        app end
9   INFO:  2018-09-12  16:12:08:  view.py:          phone
7   INFO:  2018-09-12  16:12:08:  view.py:    contact end

I want to split this dataframe into multiple dataframes based on value in msg column. my dataframe should look something like this if i want to split by "phone" as the value:

df1:

     info        date       time      file            msg
0   INFO:  2018-09-12  16:10:10:  view.py:          phone   
1   INFO:  2018-09-12  16:10:10:  view.py:         asdasd   
2   INFO:  2018-09-12  16:10:43:  view.py:  contact start   
3   INFO:  2018-09-12  16:10:43:  view.py:    contact end   
4   INFO:  2018-09-12  16:11:36:  view.py:      app start   
5   INFO:  2018-09-12  16:11:36:  view.py:     busy start   
6   INFO:  2018-09-12  16:12:08:  view.py:       busy end   
7   INFO:  2018-09-12  16:12:08:  view.py:    contact end   
8   INFO:  2018-09-12  16:12:08:  view.py:        app end

df2:

 info        date       time      file            msg
9   INFO:  2018-09-12  16:12:08:  view.py:          phone
7   INFO:  2018-09-12  16:12:08:  view.py:    contact end

Upvotes: 1

Views: 196

Answers (1)

jpp
jpp

Reputation: 164693

Use a dictionary for a variable number of related variables. Here you can combine with GroupBy + cumsum:

d = dict(tuple(df.groupby(df['msg'].eq('phone').cumsum())))

Then access your dataframes via d[1], d[2], ..., d[n].

Result:

{1:  info        date       time      file           msg
 0  INFO:  2018-09-12  16:10:10:  view.py:         phone
 1  INFO:  2018-09-12  16:10:10:  view.py:        asdasd
 2  INFO:  2018-09-12  16:10:43:  view.py:  contactstart
 3  INFO:  2018-09-12  16:10:43:  view.py:    contactend
 4  INFO:  2018-09-12  16:11:36:  view.py:      appstart
 5  INFO:  2018-09-12  16:11:36:  view.py:     busystart
 6  INFO:  2018-09-12  16:12:08:  view.py:       busyend
 7  INFO:  2018-09-12  16:12:08:  view.py:    contactend
 8  INFO:  2018-09-12  16:12:08:  view.py:        append,

 2:  info        date       time      file         msg
 9  INFO:  2018-09-12  16:12:08:  view.py:       phone
 7  INFO:  2018-09-12  16:12:08:  view.py:  contactend}

Upvotes: 2

Related Questions