Century Xu
Century Xu

Reputation: 95

how to iterate a Series with multiindex in pandas

I am a beginner to pandas. And now I want to realise Decision Tree Algorithm with pandas. First, I read test data into a padas.DataFrame, it is like below:

In [4]: df = pd.read_csv('test.txt', sep = '\t')

In [5]: df
Out[5]:
  Chocolate Vanilla Strawberry Peanut
0         Y       N          Y      Y
1         N       Y          Y      N
2         N       N          N      N
3         Y       Y          Y      Y
4         Y       Y          N      Y
5         N       N          N      N
6         Y       Y          Y      Y
7         N       Y          N      N
8         Y       N          Y      N
9         Y       N          Y      Y

then I groupby 'Peanut' and 'Chocolate', what I get is:

In [15]: df2 = df.groupby(['Peanut', 'Chocolate'])

In [16]: serie1 = df2.size()

In [17]: serie1
Out[17]:
Peanut  Chocolate
N       N            4
        Y            1
Y       Y            5
dtype: int64

Now, the type of serie1 is Series. I can access the value of serie1 but I can not get value of 'Peanut' and 'Chocolate. How can I get the number of serie1 and the value of 'Peanut' and 'Chocolate at the same time?

Upvotes: 2

Views: 1981

Answers (1)

agold
agold

Reputation: 6276

You can use index:

>>> serie1.index
MultiIndex(levels=[[u'N', u'Y'], [u'N', u'Y']],
           labels=[[0, 0, 1], [0, 1, 1]],
           names=[u'Peanut', u'Chocolate'])

You can obtain the values of the column names and the levels. Note that the labels refer to the index in the same row in levels. So for example for 'Peanut' the first label is levels[0][labels[0][0]] which is 'N'. The last label of 'Chocolate' is levels[1][labels[1][2]] which is 'Y'.

I created a small example which loops through the indexes and prints all data:

#loop the rows
for i in range(len(serie1)):
   print "Row",i,"Value",serie1.iloc[i],
   #loop the columns
   for j in range(len(serie1.index.names)):
      print "Column",serie1.index.names[j],"Value",serie1.index.levels[j][serie1.index.labels[j][i]],
   print

Which results in:

Row 0 Value 4 Column Peanut Value N Column Chocolate Value N
Row 1 Value 1 Column Peanut Value N Column Chocolate Value Y
Row 2 Value 5 Column Peanut Value Y Column Chocolate Value Y

Upvotes: 2

Related Questions