Richard
Richard

Reputation: 65510

Why does value_counts not show all values present?

I am using pandas 0.18.1 on a large dataframe. I am confused by the behaviour of value_counts(). This is my code:

print df.phase.value_counts()
def normalise_phase(x):
    print x
    return int(str(x).split('/')[0])
df['phase_normalised'] = df['phase'].apply(normalise_phase)

This prints the following:

2      35092
3      26248
1      24646
4      22189
1/2     8295
2/3     4219
0       1829
dtype: int64
1
nan

Two questions:

Upvotes: 4

Views: 5198

Answers (1)

user2285236
user2285236

Reputation:

You need to pass dropna=False for NaNs to be tallied (see the docs). int64 is the dtype of the series (counts of the values). The values themselves are the index. dtype of the index will be object, if you check.

ser = pd.Series([1, '1/2', '1/2', 3, np.nan, 5])

ser.value_counts(dropna=False)
Out: 
1/2    2
5      1
3      1
1      1
NaN    1
dtype: int64

ser.value_counts(dropna=False).index
Out: Index(['1/2', 5, 3, 1, nan], dtype='object')

Upvotes: 8

Related Questions