Why does `stats.mode()` function truncate the answer on an array of strings?

Question

I'm trying to use scipy's stats.mode function to get the most common string out of an array of strings. But the function is truncating the strings for some reason.

>>> a
array([' State-gov', ' Self-emp-not-inc', ' Private', ..., ' Private',
       ' Private', ' Self-emp-inc'],
      dtype='|S27')

>>> stats.mode(a)
(array([' P'],
      dtype='|S2'), array([ 22696.]))

(The answer should be ' Private'.) Any ideas how I can get the full string? And why is this happening?

pfctdayelise · Accepted Answer

It is/was happening because there was a bug in the implementation, where the results array was not being initialised with the same dtype as the input array. I submitted a pull request to fix it which has been accepted so I suppose it will be fixed in scipy 1.9.

Why does `stats.mode()` function truncate the answer on an array of strings?

Answers (2)

Related Questions