Reputation: 1586
I wan to aggregate the values in in some columns if they share the same value in a specific column of the data frame?
In otherwords, how can I get data frame B from A? In this example, I want to check based on the values in column c1, if there are rows with the same value, then I want to put sum of columns c3-c5 in the new column (c6) in the row with the value "c" in its column c2.
A:
c0 c1 c2 c3 c4 c5 0 1 a d 3 4 0 1 1 a c 0 0 6 2 1 b d 3 1 0 1 1 b c 0 0 1
B:
c0 c1 c2 c3 c4 c5 c6 0 1 a d 3 4 0 1 1 a c 0 0 6 13 2 1 b d 3 1 0 1 1 b c 0 0 1 5
Here is what I have done:
import pandas as pd
import numpy as np
A = pd.read_excel(file_location)
A['c6'] = pd.Series(0, index=A.index)
A.c6[A['c2'] == 'c'] = A.c5 + A.c4[A.c1 == A.c1.shift(-1)] + A.c3[A.c1 == A.c1.shift(-1)]
B = A
Upvotes: 1
Views: 3684
Reputation: 15963
import pandas as pd
import io
import numpy as np
import datetime
data = """
c0 c1 c2 c3 c4 c5
0 1 a d 3 4 0
1 1 a c 0 0 6
2 1 b d 3 1 0
1 1 b c 0 0 1
"""
df = pd.read_csv(io.StringIO(data), delimiter='\s+')
df2 = pd.DataFrame(df.groupby('c1').sum().ix[:,1:].sum(axis=1), columns=['c6']).reset_index()
df3 = pd.merge(df,df2,on='c1').set_index(df.index)
print(df3)
That will give:
c0 c1 c2 c3 c4 c5 c6
0 1 a d 3 4 0 13
1 1 a c 0 0 6 13
2 1 b d 3 1 0 5
1 1 b c 0 0 1 5
Upvotes: 1