HBat
HBat

Reputation: 5682

Finding mean of standard normal distribution in a given interval

I want to find mean of standard normal distribution in a given interval.

For example, if I divide standard normal distribution into two ([-Inf:0] [0:Inf]) I want to get the mean of each half.

Following code does almost exactly what I want:

divide <- 2
boundaries <- qnorm(seq(0,1,length.out=divide+1))
t <- sort(rnorm(100000))
means.1 <- rep(NA,divide)
for (i in 1:divide) {
    means.1[i] <- mean(t[(t>boundaries[i])&(t<boundaries[i+1])])
  }    

But I need a more precise (and elegant) method to calculate these numbers (means.1).

I tried the following code but it did not work (maybe because of the lack of my probability knowledge).

divide <- 2
boundaries <- qnorm(seq(0,1,length.out=divide+1))
means.2 <- rep(NA,divide)
f <- function(x) {x*dnorm(x)}
for (i in 1:divide) {
  means.2[i] <- integrate(f,lower=boundaries[i],upper=boundaries[i+1])$value
}    

Any ideas? Thanks in advance.

Upvotes: 6

Views: 1750

Answers (5)

Josh O&#39;Brien
Josh O&#39;Brien

Reputation: 162321

Using the distrEx and distr packages:

library(distrEx)
E(Truncate(Norm(mean=0, sd=1), lower=0, upper=Inf))
# [1] 0.797884

(See vignette(distr) in the distrDoc package for an excellent overview of the suite of distr and related packages.)


Or, using just base R, here's an alternative that constructs a discrete approximation of the expectation within the interval between lb and ub. The bases of the approximating rectangles are adjusted so that they all have equal areas (i.e. so that the probability of a point falling in each one of them is identical).

intervalMean <- function(lb, ub, n=1e5, ...) {
    ## Get x-values at n evenly-spaced quantiles between lower and upper bounds
    xx <- qnorm(seq(pnorm(lb, ...), pnorm(ub, ...), length = n), ...)
    ## Calculate expectation
    mean(xx[is.finite(xx)])
}

## Your example
intervalMean(lb=0, ub=1)
# [1] 0.4598626

## The mean of the complete normal distribution
intervalMean(-Inf, Inf)
## [1] -6.141351e-17

## Right half of standard normal distribution
intervalMean(lb=0, ub=Inf)
# [1] 0.7978606

## Right half of normal distribution with mean 0 and standard deviation 100
intervalMean(lb=0, ub=Inf, mean=0, sd=100)
# [1] 79.78606

Upvotes: 1

IRTFM
IRTFM

Reputation: 263332

Let's say your cutpoints are -1, 0, 1, and 2 and you are interested in the mean of sections simulating a standard Normal.

 samp <-   rnorm(1e5)
 (res <- tapply(samp, findInterval(samp, c( -1, 0, 1, 2)), mean) )
#         0          1          2          3          4 
#-1.5164151 -0.4585519  0.4608587  1.3836470  2.3824633 

Please do note that the labeling could be improved. One improvement could be:

names(res) <-  paste("[", c(-Inf, -1, 0, 1, 2, Inf)[-6],  " , ", 
                      c(-Inf, -1, 0, 1, 2, Inf)[-1], ")", sep="")
> res
[-Inf , -1)    [-1 , 0)     [0 , 1)     [1 , 2)   [2 , Inf) 
 -1.5278185  -0.4623743   0.4621885   1.3834442   2.3835116 

Upvotes: 2

HBat
HBat

Reputation: 5682

Thanks for answering my question.

I combined all answers as I understand:

    divide <- 5
    boundaries <- qnorm(seq(0,1,length.out=divide+1))
# My original thinking        
    t <- sort(rnorm(1e6))
    means.1 <- rep(NA,divide)
    for (i in 1:divide) {
        means.1[i] <- mean(t[((t>boundaries[i])&(t<boundaries[i+1]))])
      }    

# Based on @DWin
    t <- sort(rnorm(1e6))
    means.2 <- tapply(t, findInterval(t, boundaries), mean)

# Based on @Rcoster
    means.3 <- rep(NA,divide)
    f <- function(x, ...) x * dnorm(x, ...)
    for (i in 1:divide) {
      means.3[i] <- integrate(f, boundaries[i], boundaries[i+1])$value / (pnorm(boundaries[i+1]) - pnorm(boundaries[i]))
    }   

# Based on @Kith
    t <- sort(rnorm(1e6))
    means.4 <- rep(NA,divide)    
    for (i in 1:divide) {
      means.4[i] <- fitdistr(t[t > boundaries[i] & t < boundaries[i+1]], densfun="normal")$estimate[1]
    }    

Results

>   means.1
[1] -1.4004895486 -0.5323784986 -0.0002590746  0.5313539906  1.3978177100
>   means.2   
[1] -1.3993590768 -0.5329465789 -0.0002875593  0.5321381745  1.3990997391 
>   means.3
[1] -1.399810e+00 -5.319031e-01  1.389222e-16  5.319031e-01  1.399810e+00
>   means.4
[1] -1.399057073 -0.531946615 -0.000250952  0.531615180  1.400086731

I believe @Rcoster is the one that I wanted. Rest is innovative approaches compared to mine but still approximate. Thanks.

Upvotes: 3

Rcoster
Rcoster

Reputation: 3210

The problem is that the integral of dnorm(x) in the interval (-Inf to 0) isn't 1, that's why you got the wrong answer. To correct you must divide the result you got by 0.5 (the integral result). Like:

func <- function(x, ...) x * dnorm(x, ...)
integrate(func, -Inf, 0, mean=0, sd=1)$value / (pnorm(0, mean=0, sd=1) - pnorm(-Inf, mean=0, sd=1)) 

Adapt it to differents intervals should be easy.

Upvotes: 3

kith
kith

Reputation: 5566

You can use a combination of fitdistr and vector indexing.

Here's an example of how to get mean and std of just the positive values:

library("MASS")
x = rnorm(10000)
fitdistr(x[x > 0], densfun="normal")

or just the values in the interval (0,2):

fitdistr(x[x > 0 & x < 2], densfun="normal")

Upvotes: 2

Related Questions