Reputation: 31

most occurring value in a vector

I have a vector file with 1000 values. All the values were generated using Random function between 0-1.

x <- runif(100,min=0,max=1)
x
  [1] 0.84620011 0.82525410 0.31622827 0.08040362 0.12894525 0.23997187 0.57177296 0.91691368 0.65751720
 [10] 0.39810175 0.60632205 0.26339035 0.93543618 0.09662383 0.35147739 0.51731042 0.29151612 0.54411769
 [19] 0.73688309 0.26086586 0.37808273 0.19163366 0.62776847 0.70973345 0.31802726 0.69101574 0.50042561
 [28] 0.20768256 0.23555818 0.21015820 0.18221151 0.85593725 0.12916935 0.52222127 0.62269135 0.51267707
 [37] 0.60164023 0.30723904 0.81990231 0.61771762 0.02502631 0.47427724 0.21250040 0.88611710 0.88648546
 [46] 0.92586513 0.57015942 0.33454379 0.03572245 0.68120369 0.48692522 0.76587764 0.55214917 0.31137200
 [55] 0.47170307 0.48639510 0.68922858 0.73506033 0.23541740 0.81793240 0.17184666 0.06670039 0.55664270
 [64] 0.10030533 0.94620061 0.58572228 0.53333567 0.80887841 0.55015406 0.82491114 0.81251132 0.06038019
 [73] 0.10918904 0.84011824 0.33169617 0.03568364 0.07703029 0.15601158 0.31623253 0.25021777 0.77024833
 [82] 0.88588620 0.49044305 0.10165930 0.55494697 0.17455070 0.94458467 0.43135868 0.99313733 0.04482747
 [91] 0.53453604 0.52500493 0.35496966 0.06994880 0.11377845 0.71307042 0.35086237 0.04032254 0.23744845
[100] 0.81131033

Out of all these values in the vector, I need to find the most occurring value(Or close to that). I'm new to R and have no idea what this. Please help?

One approach I have - Divide all the values in a certain ranges and find the frequency distribution. But will it be helpful?

Upvotes: 2

Answers (3)

MS Berends

Reputation: 5209

To really get just the most occurrent value, or when using discrete data as input, you could simply create a table, sort the results and return the highest value:

values <- c("a", "a", "c", "c", "c")

names(sort(table(values), decreasing = TRUE)[1])
#> [1] "c"

Breaking it down:

# create a table of the values
table(values)
#> a c 
#> 2 3

# sort the table descending on number of occurrences
sort(table(values), decreasing = TRUE)
#> c a 
#> 3 2

# now only keep the first value
sort(table(values), decreasing = TRUE)[1]
#> c 
#> 3

# so the final line:
names(sort(table(values), decreasing = TRUE)[1])
#> [1] "c"

If you're feeling like wanting to do fancy stuff, create a function that does this for you:

get_mode <- function(x) {
  names(sort(table(values), decreasing = TRUE)[1])
}

get_mode(values)
#> [1] "c"

Upvotes: 0

RHertel

Reputation: 23788

One possibility to analyze the distribution of the numbers could consist in plotting a histogram and adding an approximate probability density distribution. This can be done with the ggplot2 library:

set.seed(123) # used here for reproducibility
x <- runif(100) # pseudo-random numbers between 0 and 1
library(ggplot2)
p <- ggplot(as.data.frame(x),aes(x=x, y=..density..)) + 
  geom_histogram(fill="lightblue",colour="grey60",bins=50) + 
  geom_density()

The value of bins specified in geom_histigram() is the number of bars in the histogram. You may want to try to change this value to obtain a different representation of the distribution.

You could use base Rand plot a simple histogram:

hist(x)

There you can also change the bin width (see breaks), but the default might be sufficient to show the concept.

You can identify which bin in this histogram has the most entries with

> hist(x)$mids[which.max(hist(x)$counts)]
#[1] 0.45

Which in this case means that most values occur near a value of 0.45 (the middle of the bin describing the range between 0.4 and 0.5).

Hope this helps.

Upvotes: 1

Verena Praher

Reputation: 1272

You can do this:

set.seed(12)
x <- runif(100,min=0,max=1)
n <- length(x)
x_cut<-cut(x, breaks = n/4)
which(table(x_cut)==max(table(x_cut)))

The result depends on the breaks value you set. This is an alternative to using a histogram if you don't need one.

Upvotes: 0

most occurring value in a vector

Answers (3)

Related Questions