Reputation: 829
I'm trying to use scipy's stats.mode
function to get the most common string out of an array of strings. But the function is truncating the strings for some reason.
>>> a
array([' State-gov', ' Self-emp-not-inc', ' Private', ..., ' Private',
' Private', ' Self-emp-inc'],
dtype='|S27')
>>> stats.mode(a)
(array([' P'],
dtype='|S2'), array([ 22696.]))
(The answer should be ' Private'
.) Any ideas how I can get the full string? And why is this happening?
Upvotes: 1
Views: 520
Reputation: 5285
It is/was happening because there was a bug in the implementation, where the results array was not being initialised with the same dtype as the input array. I submitted a pull request to fix it which has been accepted so I suppose it will be fixed in scipy 1.9.
Upvotes: 1
Reputation: 7807
Not sure you can solve with sp.stats.mode()
- I have also encountered this weird behavior before.
For a non-scipy solution you can use collections.Counter
:
collections.Counter(a).most_common(1)
This will return a tuple with the string and its number of occurrences.
Upvotes: 3