Reputation: 145
I have a data frame with a bool type column. I would like to obtain the number of True values per id using pandas' groupby and agg functions. I've done this a bunch of times, but it seems the resulting column's type depends on the data frame. Here is an example:
import pandas as pd
d = {'id': [1, 1, 2, 3], 'bool': [True, False, False, True]}
df = pd.DataFrame(data=d)
print(df.groupby(['id']).agg({'bool': 'sum'}))
The output I get from this code is:
id bool
0 1 True
1 2 False
2 3 True
Which is not what I want. Now, if agg tries to sum two True values:
import pandas as pd
d = {'id': [1, 1, 2, 3], 'bool': [True, True, False, True]}
df = pd.DataFrame(data=d)
print(df.groupby(['id']).agg({'bool': 'sum'}))
Then I get:
id bool
0 1 2.00
1 2 0.00
2 3 1.00
Which is how I want.
I've seen situations in which a few rows are of type bool, whereas others are type float. It seems to be related with the number of rows grouped: if only one row, then it shows the bool value; if more than one, resulting type is float. I would like the resulting aggregated columns to always be of type float.
Pandas version is 1.0.1
Upvotes: 5
Views: 9111
Reputation: 1703
You can just use the max function
df.groupby(['id']).agg({'bool': 'max'})
Upvotes: 3
Reputation: 31011
There is a dedicated Numpy function to count non-zero cells (True is counted as 1, False as 0). So you can run:
df.groupby(['id']).bool.agg(lambda gr: np.count_nonzero(gr))
I assume that you want integer number of True values.
Otherwise append .astype(float)
.
Upvotes: 2
Reputation: 3844
You can sum and keep the output as a float in all cases with the following:
import pandas as pd
d = {'id': [1, 1, 2, 3], 'bool': [True, False, False, True]}
df = pd.DataFrame(data=d)
print(df.groupby(['id'])['bool'].sum().astype(float))
Yields the output
id
1 1.0
2 0.0
3 1.0
Name: bool, dtype: float64
Upvotes: 3
Reputation: 51
You can use typecast to float. Use reset_index if you want a separate column for your index at the end
df.groupby(['id']).agg({'bool': 'sum'}).astype(float).reset_index()
Example:
>>> import pandas as pd
>>> d = {'id': [1, 1, 2, 3], 'bool': [True, True, False, True]}
>>> df = pd.DataFrame(data=d)
>>>
>>> df.groupby(['id']).agg({'bool': 'sum'}).astype(float).reset_index()
id bool
0 1 2.0
1 2 0.0
2 3 1.0
>>>
Upvotes: 0