Reputation: 25
I am trying to bin a column and take the sum for rows that satisfy an indexing condition.
Here is an example of the kind of DataFrame I'm working with:
df
A B C D
a aa 0 1
a aa 1 2
a aa 2 10
a bb 0 4
a bb 1 5
a bb 2 60
b aa 0 7
b aa 1 8
b aa 2 90
b bb 0 10
b bb 1 11
b bb 2 12
What I'd like to do is first bin the column of features C
(ie 0,1
is f0
and 2
is f1
) and sum the values in column D
within each bin and still keep the respective A
and B` index pairs intact. The result of which would look like:
A B C D
a aa f0 3
a aa f1 10
a bb f0 9
a bb f1 60
b aa f0 15
b aa f1 90
b bb f0 21
b bb f1 12
I've done the binning using pandas.cut
like so:
cut = pd.cut(df['C'], bins=[0,2,3], right=False, labels=['f0','f1'])
But I am having trouble keeping maintaining the index pairs (columns A,B
) intact using df.groupby(cut)
and I would like to not have to iterate through all possible (A,B)
pairs and bin + sum them myself.
Any help would be appreciated.
Upvotes: 1
Views: 621
Reputation: 8790
You can do a groupby
on ['A','B','C']
after reassigning the values in C
:
df['C'] = np.where(df['C'].isin([0,1]), 'f0', 'f1')
df = df.groupby(['A','B','C']).sum().reset_index()
Result:
A B C D
0 a aa f0 3
1 a aa f1 10
2 a bb f0 9
3 a bb f1 60
4 b aa f0 15
5 b aa f1 90
6 b bb f0 21
7 b bb f1 12
Upvotes: 1
Reputation: 68156
You need to add the additional columns that define your desired groups into your groupby statement:
from io import StringIO
import pandas
df = pandas.read_table(StringIO("""\
A B C D
a aa 0 1
a aa 1 2
a aa 2 10
a bb 0 4
a bb 1 5
a bb 2 60
b aa 0 7
b aa 1 8
b aa 2 90
b bb 0 10
b bb 1 11
b bb 2 12"""), sep='\s+')
output = (
df.assign(C=pandas.cut(df['C'], bins=[0,2,3], right=False, labels=['f0','f1']))
.groupby(['A', 'B', 'C'], as_index=False)
.sum()
)
A B C D
a aa f0 3
a aa f1 10
a bb f0 9
a bb f1 60
b aa f0 15
b aa f1 90
b bb f0 21
b bb f1 12
Upvotes: 1