Reputation: 65510
I am using pandas 0.18.1 on a large dataframe. I am confused by the behaviour of value_counts()
. This is my code:
print df.phase.value_counts()
def normalise_phase(x):
print x
return int(str(x).split('/')[0])
df['phase_normalised'] = df['phase'].apply(normalise_phase)
This prints the following:
2 35092
3 26248
1 24646
4 22189
1/2 8295
2/3 4219
0 1829
dtype: int64
1
nan
Two questions:
nan
printing as an output of normalise_phase
, when nan
is not listed as a value in value_counts
? value_counts
show dtype
as int64
if it has string values like
1/2
and nan
in it too?Upvotes: 4
Views: 5198
Reputation:
You need to pass dropna=False
for NaNs to be tallied (see the docs).
int64
is the dtype of the series (counts of the values). The values themselves are the index. dtype of the index will be object, if you check.
ser = pd.Series([1, '1/2', '1/2', 3, np.nan, 5])
ser.value_counts(dropna=False)
Out:
1/2 2
5 1
3 1
1 1
NaN 1
dtype: int64
ser.value_counts(dropna=False).index
Out: Index(['1/2', 5, 3, 1, nan], dtype='object')
Upvotes: 8