Reputation: 13
I'm going over the book 'Python for Data Analysis' by Wes Mckinney. At one point, we have this:
by_tz_os = cframe.groupby(['tz', operating_system])
agg_counts = by_tz_os.size().unstack().fillna(0)
I tried to understand the structure of series by_tz_os.size() by:
print 'Name is:', by_tz_os.size().name
print 'Index is:', by_tz_os.size().index.name
print by_tz_os.size()[:5]
Here is result:
Name is: None
Index is: None
tz
Not Windows 245
Windows 276
Africa/Cairo Windows 3
Africa/Casablanca Windows 1
Africa/Ceuta Windows 2
dtype: int64
What are these 3 columns (col1,col2,col3, let's say) in series context? I'm thinking col1 & col2 are index, but that's not the case from the result above. I'm confused.
Upvotes: 0
Views: 154
Reputation: 6302
The result of DataFrame.groupby().size()
is explained here. The relevant part:
Another simple aggregation example is to compute the size of each group. This is included in GroupBy as the size method. It returns a Series whose index are the group names and whose values are the sizes of each group.
In [54]: grouped.size() Out[54]: A B bar one 1 three 1 two 1 foo one 2 three 1 two 2 dtype: int64
Regarding
print 'Name is:', by_tz_os.size().name
print 'Index is:', by_tz_os.size().index.name
yielding
Name is: None
Index is: None
As you already mentioned, by_tz_os.size()
is a Series. Unlike with, for example, the cframe["tz"]
Series for which the name
attribute will be set to "tz", by_tz_os.size()
does not have a name attached to it, most probably due to there being no obvious way to name groupby() results in general. Which is, of course, why by_tz_os.size().name
is None
here.
by_tz_os.size().index.name
returns a None
simply due to the index being a MultiIndex. You are expected to use names
instead (which works with both the normal Index and the MultiIndex, by the way).
Upvotes: 2