Saroupille
Saroupille

Reputation: 629

Create a function with multiple parameters in R

I want to compute the following functions :

equation

enter image description here

here, g(x) is the density function of a distribution. I want to compute this function for several distributions. In addition, I use the library fitdistrplus.

To create g, I use the function do.call this way :

g<-function(x) {do.call(paste("d",i,sep=""),c(list(x=x),fti$estimate))}

fti$estimate contains the parameters of the distribution i.

G(x) is the cumulative distribution computed this way :

G<-function(x) {do.call(paste("p",i,sep=""),c(list(q=x),fti$estimate))}

I compute f(x) this way :

f<function(n,x) {n*g(x)*(1-G(x))^(n-1)

At last, I compute h(x) this way :

h<- function(n) {integrate(function(x) {x*f(n,x)},0,Inf)}

However, I can't plot these functions, I get the following errors :

1: In n*g(x):
Longer object length is not a multiple of shorter object length
2: In (1-G(x))^(n-1):
Longer object length is not a multiple of shorter object length
3: In x*f(n,x) :
Longer object length is not a multiple of shorter object length

Beyond, if I juste want to plot f(n,x), I get this error :

Error in list(x=x) :'x' is missing

The minimal snipset I have is the following

#i can be "exp" "lnorm" "norm" etc...
for( i in functionsName) {
    png(paste(fileBase,"_",i,"_","graphics.png",sep=""))
    plot.new()

    fti<-fitdist(data, i)
    plotdist(data,i, para=as.list(fti[[1]]))
    #fti is a datatable or datafram
    #fti$estimate looks like this :
    # meanlog    sdlog 
    #8.475449 1.204958 

    #g
    pdf<-function(x) {do.call(paste("d",i,sep=""), c(list(x=x),fti$estimate))}  
#G
    cdf<-function(x) do.call(paste("p",i,sep=""), c(list(q=x),fti$estimate))


#f
    minLaw<- function(n,x) {n*pdf(x)*(1-cdf(x))^(n-1)}
    #h
    minExpectedValue<-function(n) {integrate(function(x) {x*minLaw(n,x)},0,Inf)}

    #these 2 following lines give an error
    plot(minExpectedValue)
    plot(minLaw)

    dev.off()
}

Upvotes: 0

Views: 5353

Answers (2)

Hong Ooi
Hong Ooi

Reputation: 57686

The problem is that the plot method for a function expects that the function will be vectorised. In other words, if given an argument of length N, it should return a vector of results also of length N.

Your minExpectedValue doesn't satisfy this; it expects that n will be a scalar, and returns a scalar. You can quickly fix this up with Vectorize. You also need to specify the name of the argument to plot over, in this case n.

minExpectedValue <- Vectorize(minExpectedValue)
plot(minExpectedValue, xname="n")

Upvotes: 1

Teemu Daniel Laajala
Teemu Daniel Laajala

Reputation: 2366

I had to do some reverse engineering to figure out your d1, q1 etc calls, but I think this is how you do it. Perhaps the original problem lies in a function call like f(n=2:3, x=1:9); in such a call n should be a single value, not a vector of values.

Even if length of x was a multiple of n length, the output would most likely not be what you really wanted. If you try to give n a vector form, you might end up in a recycled (false) output:

> print(data.frame(n=2:3, x=1:6))
- n x
1 2 1
2 3 2
3 2 3
4 3 4
5 2 5
6 3 6

where x would be evaluated with n=2 at point x=1, n=3 at point x=2 etc. What you really would've wanted is something in the lines of:

> print(expand.grid(x=1:5, n=2:3))
-  x n
1  1 2
2  2 2
3  3 2
4  4 2
5  5 2
6  1 3
7  2 3
8  3 3
9  4 3
10 5 3

You could do this by calling separately for each n value:

lapply(2:3, FUN=function(n) (f(n, x=1:5)))
#[[1]]
#[1] 0.0004981910 0.0006066275 0.0007328627 0.0008786344 0.0010456478
#
#[[2]]
#[1] 0.0007464956 0.0009087272 0.0010974595 0.0013152213 0.0015644676

Did you use the same fti for all the distribution fits, even though it should've been different? Or does the i in fti refer to index i and it was a list of fits in form of ft[[i]]?

Below is a wrapper function, which is called separately for each n-value (and distribution i):

wrapper <- function(i, x, n, fti){
    # As was provided by OP
    g<-function(x) {do.call(paste("d",i,sep=""),c(list(x=x),fti$estimate))}

    G<-function(x) {do.call(paste("p",i,sep=""),c(list(q=x),fti$estimate))}
    # does the i in fti refer to fit of i:th distribution, i.e. should it be a list where i:th location in ft is i:th distribution estimates?

    f<-function(n,x) {n*g(x)*(1-G(x))^(n-1)}
    # was missing a '-' and a '}'

    h<- function(n) {integrate(function(x) {x*f(n,x)},0,Inf)}

    list(gres = g(x), Gres = G(x), fres = f(n,x), hres = h(n))
}

# Example data
require("fitdistrplus")
data(groundbeef)
serving <- groundbeef$serving

# Gumbel distribution
d1 <- function(x, a, b) 1/b*exp((a-x)/b)*exp(-exp((a-x)/b))
p1 <- function(q, a, b) exp(-exp((a-q)/b))
q1 <- function(p, a, b) a-b*log(-log(p))

fti1 <- fitdist(serving, "1", start=list(a=10, b=10))
#> fti1$estimate
#       a        b 
#56.95893 29.07871

# Normal distribution

# dnorm, pnorm and qnorm are available in the default environment
d2 <- dnorm
p2 <- pnorm
q2 <- qnorm

fti2 <- fitdist(serving, "2", start=list(mean=0, sd=1))
#> fti2$estimate
#    mean       sd 
#73.67743 35.92581

# Sequence of x-values
xs <- seq(-100, 100, by=1)

print((resultdist1n2 <- wrapper(i=1, x=xs, n=2, fti=fti1))$hres)
print((resultdist1n3 <- wrapper(i=1, x=xs, n=3, fti=fti1))$hres)
print((resultdist2n2 <- wrapper(i=2, x=xs, n=2, fti=fti2))$hres)
print((resultdist2n3 <- wrapper(i=2, x=xs, n=3, fti=fti2))$hres)

plot(xs, resultdist1n2$fres, col=1, type="l", ylim=c(0,0.025), xlab="x", ylab="f(n, x)")
points(xs, resultdist1n3$fres, col=2, type="l")
points(xs, resultdist2n2$fres, col=3, type="l")
points(xs, resultdist2n3$fres, col=4, type="l")
legend("topleft", legend=c("Gamma (i=1) n=2", "Gamma (i=1) n=3", "Normal (i=2) n=2", "Normal (i=2) n=3"), col=1:4, lty=1)

f-functions with different i and n

And the results of your desired h as found in resultdist1n2$hres etc:

h(n=2) for distribution i=1:
53.59385 with absolute error < 0.00022
h(n=3) for distribution i=1:
45.23146 with absolute error < 4.5e-05
h(n=2) for distribution i=2:
53.93748 with absolute error < 1.1e-05
h(n=3) for distribution i=2:
44.06331 with absolute error < 2e-05

EDIT: Here's how one uses the lapply function to call for each of the vector of n values 0<=n<=256:

ns <- 0:256
res1 <- lapply(ns, FUN=function(nseq) wrapper(i=1, x=xs, n=nseq, fti=fti1))
par(mfrow=c(1,2))
plot.new()
plot.window(xlim=c(-100,100), ylim=c(0, 0.05))
box(); axis(1); axis(2); title(xlab="x", ylab="f(n,x)", main="f(n,x) for gamma (i=1), n=0:256")
for(i in 1:length(ns)) points(xs, res1[[i]]$fres, col=rainbow(257)[i], type="l")
# perform similarly for the other distributions by calling with i=2, fti=fti2
# h as a function of n for dist i=1
plot(ns, unlist(lapply(res1, FUN=function(x) x$hres$value)), col=rainbow(257), xlab="n", ylab="h(n)", main="h(n) for gamma (i=1), n=0:256")

I would plot each distribution i separately like this.

0<=n<=256 plots

Upvotes: 3

Related Questions