Astrus
Astrus

Reputation: 193

Python Pandas concatenate rows and sum up values

I have a dataframe that look like this :

name    event_code_0    event_code_1100 event_code_1110 event_code_1120 event_code_1120 event_code_1199
0   A   1   0   0   0   0   0
1   A   0   0   1   0   0   0
2   A   0   0   0   1   0   0
3   A   0   0   1   0   0   0
4   A   0   1   0   0   0   0
5   A   0   0   0   0   1   0
6   A   0   0   0   1   0   0
7   A   0   1   0   0   0   0
8   A   1   0   0   0   0   0
9   B   0   0   0   0   0   1
10  B   0   0   0   0   1   0
11  B   0   0   1   0   0   0
12  B   0   1   0   0   0   0
13  B   1   0   0   0   0   0
14  B   0   0   1   0   0   0
15  B   0   0   0   1   0   0
16  B   0   1   0   0   0   0
17  B   0   0   0   0   0   1

I would like to merge the differents rows on the column : 'name' and count the event code.

For instance the dataframe should look like :

name    event_code_0    event_code_1100 event_code_1110 event_code_1120 event_code_1120 event_code_1199
0   A   2   2   2   2   1   0
1   B   1   2   2   1   1   2

I have try different things with .merge but without success. Maybe someone know an easy trick...

Cheers, Astrus

Upvotes: 1

Views: 1195

Answers (2)

jezrael
jezrael

Reputation: 862641

You need groupby and aggregate sum, for name to column add reset_index or parameter as_index=False:

df1 = df.groupby('name').sum().reset_index()
print (df1)
  name  event_code_0  event_code_1100  event_code_1110  event_code_1120  \
0    A             2                2                2                2   
1    B             1                2                2                1   

   event_code_1120  event_code_1199  
0                1                0  
1                1                2  

Or:

df1 = df.groupby('name', as_index=False).sum()
print (df1)
  name  event_code_0  event_code_1100  event_code_1110  event_code_1120  \
0    A             2                2                2                2   
1    B             1                2                2                1   

   event_code_1120  event_code_1199  
0                1                0  
1                1                2  

Upvotes: 4

user2285236
user2285236

Reputation:

You need to group by the name column and take the sum:

df.groupby('name').sum()
Out: 
      event_code_0  event_code_1100  event_code_1110  event_code_1120  \
name                                                                    
A                2                2                2                2   
B                1                2                2                1   

      event_code_1120.1  event_code_1199  
name                                      
A                     1                0  
B                     1                2  

This, by default, sets the name column as index. You can change that by:

df.groupby('name', as_index=False).sum()
Out: 
  name  event_code_0  event_code_1100  event_code_1110  event_code_1120  \
0    A             2                2                2                2   
1    B             1                2                2                1   

   event_code_1120.1  event_code_1199  
0                  1                0  
1                  1                2  

or df.groupby('name').sum().reset_index().

Upvotes: 2

Related Questions