Add successive rows in Pandas if they match on some columns

Question

I have a dataframe like the following one:

ID  URL     seconds
1   Email       9
1   Email       3
1   App         5
1   App         9
1   Faceboook   50
1   Faceboook   7
1   Faceboook   39
1   Faceboook   10
1   Email       39
1   Email       5
1   Email       57
1   Faceboook   7
1   Faceboook   32
1   Faceboook   3
2   App         11
2   App         10
2   Email       56
2   Faceboook   9
2   Faceboook   46
2   Faceboook   16
2   Email       21

I want to sum the 'seconds' column for successive views of the same URL by the same ID. That's the result I'm looking for:

ID  URL     seconds
1   Email       12
1   App         14
1   Faceboook   106
1   Email       101
1   Faceboook   42
2   App         21
2   Email       56
2   Faceboook   71
2   Email       21

df.groupBy(['ID', 'URL']).sum() would not work in this case as it would sum all cases of the same URL for the same ID, not only the successive ones.

Any ideas?

jezrael · Accepted Answer

You can use groupby by Series created by compare by ne column URL and shifted, last use cumsum with boolean mask:

print ((df.URL.ne(df.URL.shift())).cumsum())
0     1
1     1
2     2
3     2
4     3
5     3
6     3
7     3
8     4
9     4
10    4
11    5
12    5
13    5
14    6
15    6
16    7
17    8
18    8
19    8
20    9
Name: URL, dtype: int32

print (df['seconds'].groupby([(df.URL.ne(df.URL.shift())).cumsum(), df.ID, df.URL]).sum())

URL  ID  URL      
1    1   Email         12
2    1   App           14
3    1   Faceboook    106
4    1   Email        101
5    1   Faceboook     42
6    2   App           21
7    2   Email         56
8    2   Faceboook     71
9    2   Email         21
Name: seconds, dtype: int64

print (df['seconds'].groupby([(df.URL.ne(df.URL.shift())).cumsum(), df.ID, df.URL])
                    .sum()
                    .reset_index(level=0, drop=True)
                    .reset_index())

   ID        URL  seconds
0   1      Email       12
1   1        App       14
2   1  Faceboook      106
3   1      Email      101
4   1  Faceboook       42
5   2        App       21
6   2      Email       56
7   2  Faceboook       71
8   2      Email       21

Add successive rows in Pandas if they match on some columns

Answers (1)

Related Questions