Victor Ribeiro
Victor Ribeiro

Reputation: 628

Is Ruby statsample "mode" function correct?

Ruby's statiscts gem (Statsample) seems to be giving the wrong answer. The mode function shouldn't return the most common iten in a array? Ex.: 4

irb(main):185:0> [1,2,3,2,4,4,4,4].to_vector
=>
#<Daru::Vector:34330120 @name = nil @size = 8 >
    nil
  0   1
  1   2
  2   3
  3   2
  4   4
  5   4
  6   4
  7   4

irb(main):186:0> [1,2,3,2,4,4,4,4].to_vector.mode
=> 2

Why it is returning 2 ?

ruby 2.1.6p336 (2015-04-13 revision 50298) [x64-mingw32]

statsample (2.0.1)

Upvotes: 2

Views: 155

Answers (2)

Yaoyu Yang
Yaoyu Yang

Reputation: 406

I agree with what Jakub said. This is a bug in the mode method in the Vector module of daru gem. All the Vector related methods in statsample are now based on daru. I have submitted a pull request to daru fixing this bug, hopefully this will be in the next released version.

Upvotes: 2

Jakub
Jakub

Reputation: 743

Yes you are right Victor! There is a bug in vector lib:

[1,2,3,2,4,4,4,4].to_vector
 => 
#<Daru::Vector:70255750869440 @name = nil @size = 8 >
    nil
  0   1
  1   2
  2   3
  3   2
  4   4
  5   4
  6   4
  7   4

[1,2,3,2,4,4,4,4].to_vector.frequencies
 => {1=>1, 2=>2, 3=>1, 4=>4} 
[1,2,3,2,4,4,4,4].to_vector.frequencies.values
 => [1, 2, 1, 4] 

Then gets index of max value and returns value from base array with given index (4th position -> value 2 in your case). It's done by this method:

    def mode
      freqs = frequencies.values
      @data[freqs.index(freqs.max)]
    end

Workaround

Instead of mode method you can use this:

[1,2,3,2,4,4,4,4].to_vector.frequencies.max{|a,b| a[1]<=>b[1]}.first
 => 4 

Upvotes: 3

Related Questions