naeum
naeum

Reputation: 45

Generate samples from data following normal distribution but with new mean

I have a vector of numbers that is

set.seed(1)
x <- rnorm(8334, 1.456977, 0.3552899)
mean(x)
[1] 1.454307

Essentially, I want to randomly sample 2000 numbers from x such that mean of this sample is lower.

The key is I don't want to generate new random numbers but only sample from x, without replacement, such that I get a subset with a different mean.

Can anyone help me?

Thanks!

Upvotes: 2

Views: 371

Answers (3)

Ralf Stubner
Ralf Stubner

Reputation: 26843

How about doing rejection sampling, i.e. sampling 2000 numbers from your vector until you hit one sample that fulfills the desired properties?

set.seed(1)
x <- rnorm(8334, 1.456977, 0.3552899)
m_x <-mean(x)

y <- sample(x, 2000)
while(mean(y) >= m_x)
    y <- sample(x, 2000)

mean(y)
#> [1] 1.4477

Created on 2019-06-18 by the reprex package (v0.3.0)

This should be quite fast since there is an (roughly) even chance for the new mean to be greater or smaller than the old one.

Upvotes: 1

Shree
Shree

Reputation: 11150

This method is not truly "random" as it only picks from values that are smaller than mean(x). Let me know if this is good enough for you -

set.seed(1)

x <- rnorm(8334, 1.456977, 0.3552899)

mean(x)
[1] 1.454307

y <- sample(x, 2000, prob = x <= mean(x)) # x > mean(x) has 0 chance of getting sampled

all(y %in% x)
[1] TRUE

mean(y)
[1] 1.170856

This is effectively the same as -

z <- sample(x[x <= mean(x)], 2000)

all(z %in% x)
[1] TRUE

mean(z)
[1] 1.172033

Also, for 2000 values, the lowest possible mean is this -

mean(sort(x)[1:2000])
[1] 0.9847526

UPDATE -

Here's one way to get random sample from both sides of mean(x) although it is arbitrary and I don't know if this would guarantee sample mean less than mean(x). -

z <- sample(x, 2000, prob = (x <= mean(x)) + 0.1)

mean(z)
[1] 1.225991

table(z <= mean(x))

FALSE  TRUE 
  202  1798

Upvotes: 2

userSM12312
userSM12312

Reputation: 59

randomize normal distribution for the example

x= rnorm(8334,1.45,0.355)

pick a sample of 2000 nums

y= sample(x,2000)

lower y mean by 0.5

y=y-05

increase y's sd by 1.5

y= y*1.5

now the sd and the mean of Y will be about

mean(y)# ~0.9325603
sd(y)# ~0.5348885

hope it is the answer you are looking for

Upvotes: 0

Related Questions