psquid
psquid

Reputation: 829

Why does `stats.mode()` function truncate the answer on an array of strings?

I'm trying to use scipy's stats.mode function to get the most common string out of an array of strings. But the function is truncating the strings for some reason.

>>> a
array([' State-gov', ' Self-emp-not-inc', ' Private', ..., ' Private',
       ' Private', ' Self-emp-inc'],
      dtype='|S27')

>>> stats.mode(a)
(array([' P'],
      dtype='|S2'), array([ 22696.]))

(The answer should be ' Private'.) Any ideas how I can get the full string? And why is this happening?

Upvotes: 1

Views: 520

Answers (2)

pfctdayelise
pfctdayelise

Reputation: 5285

It is/was happening because there was a bug in the implementation, where the results array was not being initialised with the same dtype as the input array. I submitted a pull request to fix it which has been accepted so I suppose it will be fixed in scipy 1.9.

Upvotes: 1

Bitwise
Bitwise

Reputation: 7807

Not sure you can solve with sp.stats.mode() - I have also encountered this weird behavior before.

For a non-scipy solution you can use collections.Counter:

collections.Counter(a).most_common(1)

This will return a tuple with the string and its number of occurrences.

Upvotes: 3

Related Questions