Becca
Becca

Reputation: 69

How to use apply() on a user defined function in r - error saying it's not a function

I've written a function that calculates the density of a bivariate normal distribution using the width and length of flowers. I want to apply the function to a data frame to calculate the density for each row. I'm trying to use the apply() function to do this but it's giving me an error saying my function isn't a function. My function does work when I use it for a single row so I don't think it's a problem with the function itself. I tried looking into it but couldn't find much on how to implement a user defined function in apply(). Here is my code with some sample data.

density_fn<- function(x, y, mu_x, mu_y, sigma){
  mean_vec<- matrix(c((x - mu_x), (y - mu_y)))
  sigma_det<- det(sigma)
  sigma_inv<- solve(sigma)
  frac<- 1/(2*pi*sqrt(sigma_det))
  exponent<- exp(-0.5%*%t(mean_vec)%*%sigma_inv%*%mean_vec)
  den_fn<- frac*exponent
  return(den_fn)
}

flower<- data.frame(
  Width = c(20, 32, 29),
  Length = c( 51, 66, 48)
)
flower_w_mean<- 27
flower_l_mean<- 55
cov_matrix<- matrix(c(39, 0, 0, 93), nrow=2, ncol=2) 

apply(flower, 1, FUN = density_fn(flower$Width, flower$Length, 
                            flower_w_mean, flower_l_mean, cov_matrix))

Originally, I got this error:

Error in -0.5 % * % t(mean_vec) % * % sigma_inv : non-conformable arguments

I thought it was an issue with my covariance matrix, so I took out everything but the first line of the function and returned mean_vec and that's when I got this error:

Error in match.fun(FUN) : c("'density_fn(flower$Width, flower$Length, flower_w_mean, flower_l_mean, ' is not a function, character or symbol", "' cov_matrix)' is not a function, character or symbol")

Anyone know how to properly apply this function to a data frame?

Upvotes: 1

Views: 357

Answers (1)

Abdur Rohman
Abdur Rohman

Reputation: 2944

First, let me explain two error messages that you got.

The first error

Error in -0.5 % * % t(mean_vec) % * % sigma_inv : non-conformable arguments

This error message shows that the cause of the error is the multiplication of arguments not conformable to the rule. The rule of the matrix multiplication A %*% B is that the number of column(s) of A if it's a matrix, or the length of A if it's a vector, has to equal the number of row(s) of B.

In density_fn the definition mean_vec<- matrix(c((x - mu_x), (y - mu_y))) makes mean_vec a matrix that has exactly 1 column and the number of rows equal to the total length of x combined with y. Consequently, if the number of rows of sigma_inv > 1, the matrix multiplication t(mean_vec) %*% sigma_inv doesn't conform to the rule and will result in error. For example:

x <- 1:2  # length of 2
y <- 1:3 # length of 3 
sigma <- matrix(1:4, nrow = 2)
sigma_inv <- solve(sigma)
mean_vec <- matrix(c(x - mean(x), y - mean(y))) 
mean_vec # 1 row 5 columns
#     [,1]
#[1,] -0.5
#[2,]  0.5
#[3,] -1.0
#[4,]  0.0
#[5,]  1.0

t(mean_vec) %*% sigma_inv
# Error in t(mean_vec) %*% sigma_inv : non-conformable arguments

It will not result in error if each of x and y has the length of 1. That's why this function does work when you use it for a single row. For example:

x <- 2  # length of 1
y <- 3 # length of 1 
sigma <- matrix(1:4, nrow = 2)
mean_vec <- matrix(c(x - mean(x), y - mean(y))) 
sigma_inv <- solve(sigma)
t(mean_vec) %*% sigma_inv
#     [,1] [,2]
# [1,]    0    0

An alternative way to get the function to work properly for any length of x and y is to set the number of row and the number of column in mean_vec <- matrix() as follows:

mean_vec <- matrix(c((x - mu_x), (y - mu_y)), 
                     ncol = nrow(sigma), 
                     nrow = ncol(sigma))

and then to change the matrix multiplication -0.5 %*% mean_vec to a scalar multiplication -0.5 * mean_vec.

So, the function becomes:

density_fn <- function(x, y, mu_x, mu_y, sigma) {
  mean_vec <- matrix(c((x - mu_x), (y - mu_y)), 
                     ncol = nrow(sigma), 
                     nrow = ncol(sigma))
  sigma_det <- det(sigma)
  sigma_inv <- solve(sigma)
  frac <- 1 / (2 * pi * sqrt(sigma_det))
  exponent <- exp(-0.5 * t(mean_vec) %*% sigma_inv %*% mean_vec)
  den_fn <- frac * exponent
  return(den_fn)
}

The second error

Error in match.fun(FUN) : c("'density_fn(flower$Width, flower$Length, flower_w_mean, flower_l_mean, ' is not a function, character or symbol", "' cov_matrix)' is not a function, character or symbol")

This error message shows that the value of FUN in apply is not properly specified. According to the documentation of apply, the value to be assigned to FUN should be

typically is either a function or a symbol (e.g., a backquoted name) or a character string specifying a function to be searched for from the environment of the call to apply

It means you should mention only the function name to FUN. The additional arguments of the function (the 2nd argument, the 3rd one,...) should be mentioned after FUN, not within FUN. Please check ?apply for details.

However, this step alone does not solve the problem because apply is suitable for a univariate input. Because your input is multivariate, mapply is more suitable. Other options include Map and loops using for etc.

How to apply functions using mapply

It is best explained by a simple example. Please check ?mapply for details. Suppose you have x and y and you want to get z = 2x + 3y. You want to vectorize the function to x and y.

x <- c(3,4,5)
y <- c(10,20, 30)
myfun <- function(x,y) 2*x + 3*y
z <- mapply(myfun, x, y)
z
#[1]  36  68 100

If you have arguments other than x and y in myfun, you should assign them to MoreArgs. Here is the example using density_fn.

mapply(density_fn, x = flower$Width, y = flower$Length, MoreArgs = list(
                                            mu_x = flower_w_mean, 
                                            mu_y = flower_l_mean, 
                                            sigma = cov_matrix))
#         [,1]        [,2]        [,3]
# [1,] 0.001293784 0.001000747 0.001929141
# [2,] 0.001293784 0.001000747 0.001929141
# [3,] 0.001293784 0.001000747 0.001929141
# [4,] 0.001293784 0.001000747 0.001929141

These steps do not produce errors. However, because of my limited knowledge on the subject, I do not guarantee that these steps properly represent the density function of bivariate normal distribution as you intended.

Upvotes: 0

Related Questions