Jazzmine
Jazzmine

Reputation: 1875

Create sample vector data in R with a skewed distribution with limited range

I want to create in R a sample vector of data in R, in which I can control the range of values selected, so I think I want to use sample to limit the range of values generated rather than an rnorm-type command that generates a range of values based upon the type of distribution, variance, SD, etc.

So I'm looking to do a sample with a specified range (e.g. 1-5) for a skewed distribution something like this:

x=rexp(100,1/10)

Here's what I have but does not provide a skewed distribution:

y=sample(1:5,234, replace=T) 

How can I have my cake (limited range) and eat it too (skewed distribution), so to speak.

Thanks

Upvotes: 4

Views: 6370

Answers (3)

George Pipis
George Pipis

Reputation: 1822

The beta distribution takes values from 0 to 1. If you want your values to be from 0 to 5 for instance, then you can multiply them by 5. Finally, you can get a "skewness" with the beta distribution. For example, for the skewness you can get these three types:

enter image description here

And using R and beta distribution you can get similar distributions as follows. Notice that the Green Vertical line refers to mean and the Red to median:

x= rbeta(10000,5,2)
hist(x, main="Negative or Left Skewness", freq=FALSE)
lines(density(x), col='red', lwd=3)
abline(v = c(mean(x),median(x)),  col=c("green", "red"), lty=c(2,2), lwd=c(3, 3))

enter image description here

x= rbeta(10000,2,5)
hist(x, main="Positive or Right Skewness", freq=FALSE)
lines(density(x), col='red', lwd=3)
abline(v = c(mean(x),median(x)),  col=c("green", "red"), lty=c(2,2), lwd=c(3, 3))

enter image description here

x= rbeta(10000,5,5)
hist(x, main="Symmetrical", freq=FALSE)
lines(density(x), col='red', lwd=3)
abline(v = c(mean(x),median(x)),  col=c("green", "red"), lty=c(2,2), lwd=c(3, 3))

enter image description here

Upvotes: 6

sdo
sdo

Reputation: 21

To better see what the sample function is doing with integers, use the barplot function, not the histogram function:

set.seed(3)
barplot(table(sample(1:10, size = 100, replace = TRUE, prob = 10:1)))

barplot

Upvotes: 2

DatamineR
DatamineR

Reputation: 9618

set.seed(3)
hist(sample(1:10, size = 100, replace = TRUE, prob = 10:1))

enter image description here

Upvotes: 8

Related Questions