Reputation: 36
input DataFrame -- >>
output-->>
this is how I want to calculate the sum. Values in Columns A and B can vary.
I want to aggregate this DataFrame in such a way that for the common value of column A, I have to ignore the only current value of but have to consider all value of B and have to take the sum of column c with respect to the previous condition.
for example-- 1-- for A=a and B= 1, I have to ignore the value of column C for B=1 but I have to add the value of Column C where the value of columns A ==a and b!=1, and sum should be 5 (2+3)
2-- for group A=a and B=2, I have to ignore B=2 but have to consider where A=a and B!=2 so for A=a and B!=2 sum will be 4 (1 +3 ).
3--- for group A=a, B=3, I have to ignore B=3 but have to consider where A=a and B!=3 so for A=a and B!=3 sum will be 3 (1+2).
I have to this thing for the value of a million A, One A can have any number of B value.
Everything should be dynamic.
Thanks :)
Upvotes: 0
Views: 1098
Reputation: 18
I just had a similar issue. Maybe you have solved it now, but here is what I did. I would use a function that computes this special sum that you describe.
def exclusion_sum(row, df):
exclusion_mask = (df['A'] == row['A']) & (df['B'] != row['B'])
return df[exclusion_mask]['C'].sum() + row['B']
df['sum'] = df.apply(lambda x: exclusion_sum(x, df), axis=1)
Upvotes: 0
Reputation: 8508
Here's something you can try.
First I am grouping the data based on column A into a new dataframe dfsum
Then I am transforming column C to be the sum. Then subtracting the sum from original column C to get the desired value.
import pandas as pd
df = pd.DataFrame({'A': list('aaabbb'), 'B': [1,2,3,1,2,3], 'C': [1,2,3,4,5,6]})
dfsum = df.groupby(['A'])
n = dfsum['C'].transform('sum')
df['sum'] = (n - df['C'])
print (df)
Output is as follows:
A B C sum
0 a 1 1 5
1 a 2 2 4
2 a 3 3 3
3 b 1 4 11
4 b 2 5 10
5 b 3 6 9
Upvotes: 0
Reputation: 21739
You can do using a simple iteration on the dataframe rows:
# get rows except the current show
ss['sum'] = [ss.iloc[ss.index.difference([x]),1].sum() for x in range(ss.shape[0])]
print(ss)
A B sum
0 a 1 5
1 a 2 4
2 a 3 3
Sample Data
ss = pd.DataFrame({'A': list('aaa'), 'B': [1,2,3]})
Upvotes: 1