Kiso
Kiso

Reputation: 13

How to understand the data structure of a multi-index series?

I'm going over the book 'Python for Data Analysis' by Wes Mckinney. At one point, we have this:

by_tz_os = cframe.groupby(['tz', operating_system])
agg_counts = by_tz_os.size().unstack().fillna(0)

I tried to understand the structure of series by_tz_os.size() by:

print 'Name is:', by_tz_os.size().name 
print 'Index is:', by_tz_os.size().index.name 
print by_tz_os.size()[:5]

Here is result:

Name is: None 
Index is: None 

tz                            
                  Not Windows     245
                  Windows         276
Africa/Cairo      Windows          3
Africa/Casablanca Windows          1
Africa/Ceuta      Windows          2
dtype: int64

What are these 3 columns (col1,col2,col3, let's say) in series context? I'm thinking col1 & col2 are index, but that's not the case from the result above. I'm confused.

Upvotes: 0

Views: 154

Answers (1)

Martin Valgur
Martin Valgur

Reputation: 6302

The result of DataFrame.groupby().size() is explained here. The relevant part:

Another simple aggregation example is to compute the size of each group. This is included in GroupBy as the size method. It returns a Series whose index are the group names and whose values are the sizes of each group.

In [54]: grouped.size()
Out[54]: 
A    B    
bar  one      1
     three    1
     two      1
foo  one      2
     three    1
     two      2
dtype: int64

Regarding

print 'Name is:', by_tz_os.size().name
print 'Index is:', by_tz_os.size().index.name

yielding

Name is: None 
Index is: None 

As you already mentioned, by_tz_os.size() is a Series. Unlike with, for example, the cframe["tz"] Series for which the name attribute will be set to "tz", by_tz_os.size() does not have a name attached to it, most probably due to there being no obvious way to name groupby() results in general. Which is, of course, why by_tz_os.size().name is None here.

by_tz_os.size().index.name returns a None simply due to the index being a MultiIndex. You are expected to use names instead (which works with both the normal Index and the MultiIndex, by the way).

Upvotes: 2

Related Questions