Reputation: 95
I am a beginner to pandas. And now I want to realise Decision Tree Algorithm with pandas. First, I read test data into a padas.DataFrame, it is like below:
In [4]: df = pd.read_csv('test.txt', sep = '\t')
In [5]: df
Out[5]:
Chocolate Vanilla Strawberry Peanut
0 Y N Y Y
1 N Y Y N
2 N N N N
3 Y Y Y Y
4 Y Y N Y
5 N N N N
6 Y Y Y Y
7 N Y N N
8 Y N Y N
9 Y N Y Y
then I groupby 'Peanut' and 'Chocolate', what I get is:
In [15]: df2 = df.groupby(['Peanut', 'Chocolate'])
In [16]: serie1 = df2.size()
In [17]: serie1
Out[17]:
Peanut Chocolate
N N 4
Y 1
Y Y 5
dtype: int64
Now, the type of serie1 is Series. I can access the value of serie1 but I can not get value of 'Peanut' and 'Chocolate. How can I get the number of serie1 and the value of 'Peanut' and 'Chocolate at the same time?
Upvotes: 2
Views: 1981
Reputation: 6276
You can use index
:
>>> serie1.index
MultiIndex(levels=[[u'N', u'Y'], [u'N', u'Y']],
labels=[[0, 0, 1], [0, 1, 1]],
names=[u'Peanut', u'Chocolate'])
You can obtain the values of the column names and the levels. Note that the labels refer to the index in the same row in levels. So for example for 'Peanut' the first label is levels[0][labels[0][0]]
which is 'N'. The last label of 'Chocolate' is levels[1][labels[1][2]]
which is 'Y'.
I created a small example which loops through the indexes and prints all data:
#loop the rows
for i in range(len(serie1)):
print "Row",i,"Value",serie1.iloc[i],
#loop the columns
for j in range(len(serie1.index.names)):
print "Column",serie1.index.names[j],"Value",serie1.index.levels[j][serie1.index.labels[j][i]],
print
Which results in:
Row 0 Value 4 Column Peanut Value N Column Chocolate Value N
Row 1 Value 1 Column Peanut Value N Column Chocolate Value Y
Row 2 Value 5 Column Peanut Value Y Column Chocolate Value Y
Upvotes: 2