How to aggregate some rows based on their column value in Python

Question

I wan to aggregate the values in in some columns if they share the same value in a specific column of the data frame?

In otherwords, how can I get data frame B from A? In this example, I want to check based on the values in column c1, if there are rows with the same value, then I want to put sum of columns c3-c5 in the new column (c6) in the row with the value "c" in its column c2.

A:

   c0   c1   c2   c3   c4  c5
0  1    a    d    3    4   0 
1  1    a    c    0    0   6
2  1    b    d    3    1   0 
1  1    b    c    0    0   1

B:

   c0   c1   c2   c3   c4  c5  c6
0  1    a    d    3    4   0   
1  1    a    c    0    0   6   13
2  1    b    d    3    1   0   
1  1    b    c    0    0   1   5

Here is what I have done:

import pandas as pd
import numpy as np
A = pd.read_excel(file_location)
A['c6'] = pd.Series(0, index=A.index)
A.c6[A['c2'] == 'c'] = A.c5 + A.c4[A.c1 == A.c1.shift(-1)] + A.c3[A.c1 == A.c1.shift(-1)]
B = A

Leb · Accepted Answer

import pandas as pd
import io
import numpy as np
import datetime

data = """
   c0   c1   c2   c3   c4  c5
0  1    a    d    3    4   0
1  1    a    c    0    0   6
2  1    b    d    3    1   0
1  1    b    c    0    0   1
    """

df = pd.read_csv(io.StringIO(data), delimiter='\s+')

df2 = pd.DataFrame(df.groupby('c1').sum().ix[:,1:].sum(axis=1), columns=['c6']).reset_index()

df3 = pd.merge(df,df2,on='c1').set_index(df.index)

print(df3)

That will give:

   c0 c1 c2  c3  c4  c5  c6
0   1  a  d   3   4   0  13
1   1  a  c   0   0   6  13
2   1  b  d   3   1   0   5
1   1  b  c   0   0   1   5

How to aggregate some rows based on their column value in Python

Answers (1)

Related Questions