Not getting 0 index from pandas value_counts()

Question

total_val_count = dataset[attr].value_counts()      
    for i in range(len(total_val_count.index)):
        print total_val_count[i]

I have written this piece of code which counts occurrences of all distinct values of an attribute in a dataframe. The problem I am facing is that I am unable to access the first value by using index 0. I get a KeyError: 0 error in the first loop run itself.

The total_val_count contains proper values as shown below:

34 2887
4 2708
13 2523
35 2507
33 2407
3 2404
36 2382
26 2378
16 2282
22 2187
21 2141
12 2104
25 2073
5 2052
15 2044
17 2040
14 2027
28 1984
27 1980
23 1979
24 1960
30 1953
29 1936
31 1884
18 1877
7 1858
37 1767
20 1762
11 1740
8 1722
6 1693
32 1692
10 1662
9 1576
19 1308
2 1266
1 175
38 63
dtype: int64

unutbu · Accepted Answer

total_val_count is a Series. The index of the Series are values in dataset[attr], and the values in the Series are the number of times the associated value in dataset[attr] appears.

When you index a Series with total_val_count[i], Pandas looks for i in the index and returns the assocated value. In other words, total_val_count[i] is indexing by index value, not by ordinal. Think of a Series as a mapping from the index to the values. When using plain indexing, e.g. total_val_count[i], it behaves more like a dict than a list.

You are getting a KeyError because 0 is not a value in the index. To index by ordinal, use total_val_count.iloc[i].

Having said that, using for i in range(len(total_val_count.index)) -- or, what amounts to the same thing, for i in range(len(total_val_count)) -- is not recommended. Instead of

for i in range(len(total_val_count)):
    print(total_val_count.iloc[i])

you could use

for value in total_val_count.values:
    print(value)

This is more readable, and allows you to access the desired value as a variable, value, instead of the more cumbersome total_val_count.iloc[i].

Here is an example which shows how to iterate over the values, the keys, both the keys and values:

import pandas as pd

s = pd.Series([1, 2, 3, 2, 2])
total_val_count = s.value_counts()

print(total_val_count)
# 2    3
# 3    1
# 1    1
# dtype: int64

for value in total_val_count.values:
    print(value)
    # 3
    # 1
    # 1

for key in total_val_count.keys():
    print(key)
    # 2
    # 3
    # 1

for key, value in total_val_count.iteritems():
    print(key, value)
    # (2, 3)
    # (3, 1)
    # (1, 1)

for i in range(len(total_val_count)):
    print(total_val_count.iloc[i])
    # 3
    # 1
    # 1

Not getting 0 index from pandas value_counts()

Answers (1)

Related Questions