Reputation: 313
I'm trying to create a histogram based on the following groupby,
dfm.groupby(['ID', 'Readings', 'Condition']).size:
578871001 20110603 True 1
20110701 True 1
20110803 True 1
20110901 True 1
20110930 True 1
..
324461897 20130214 False 1
20130318 False 1
20130416 False 1
20130516 False 1
20130617 False 1
532674350 20110616 False 1
20110718 False 1
20110818 False 1
20110916 False 1
20111017 False 1
20111115 False 1
20111219 False 1
However, I'm trying to format the output by Condition
and group the number of ID
and Readings
. Something like this,
True
# of Readings: # of ID
1 : 5
2 : 8
3 : 15
4 : 10
5 : 4
I've tried grouping just by ID and Readings, and transforming by Condition, but have not gotten very far.
Edit:
This is what the dataframe looked like before the groupby:
CustID Condtion Month Reading Consumption
0 108000601 True June 20110606 28320.0
1 108007000 True July 20110705 13760.0
2 108007000 True August 20110804 16240.0
3 108008000 True September 20110901 12560.0
4 108008000 True October 20111004 12400.0
5 108000601 False November 20111101 9440.0
6 108090000 False December 20111205 12160.0
Upvotes: 1
Views: 223
Reputation: 109706
Is this what you are trying to achieve with your groupby
? I've included Counter
to track the count of each reading. For example, for Condtion = False, there are two CustIDs with a single reading, so the output of the first row is:
Condtion
False 1 2 # One reading, two observations of one reading.
Then, for Condtion = True, there is one customer with one reading (108000601) and two customers with two readings each. The output for this group is:
Condtion
True 1 1 # One customer with one reading.
2 2 # Two customers with two readings each.
from collections import Counter
gb = df.groupby(['Condtion', 'CustID'], as_index=False).Reading.count()
>>> gb
Condtion CustID Reading
0 False 108000601 1
1 False 108090000 1
2 True 108000601 1
3 True 108007000 2
4 True 108008000 2
>>> gb.groupby('Condtion').Reading.apply(lambda group: Counter(group))
Condtion
False 1 2
True 1 1
2 2
dtype: float64
Or, chained together as a single statement:
gb = (df
.groupby(['Condtion', 'CustID'], as_index=False)['Reading']
.count()
.groupby('Condtion')['Reading']
.apply(lambda group: Counter(group))
)
Upvotes: 1