Reputation: 69
I've written a function that calculates the density of a bivariate normal distribution using the width and length of flowers. I want to apply the function to a data frame to calculate the density for each row. I'm trying to use the apply() function to do this but it's giving me an error saying my function isn't a function. My function does work when I use it for a single row so I don't think it's a problem with the function itself. I tried looking into it but couldn't find much on how to implement a user defined function in apply(). Here is my code with some sample data.
density_fn<- function(x, y, mu_x, mu_y, sigma){
mean_vec<- matrix(c((x - mu_x), (y - mu_y)))
sigma_det<- det(sigma)
sigma_inv<- solve(sigma)
frac<- 1/(2*pi*sqrt(sigma_det))
exponent<- exp(-0.5%*%t(mean_vec)%*%sigma_inv%*%mean_vec)
den_fn<- frac*exponent
return(den_fn)
}
flower<- data.frame(
Width = c(20, 32, 29),
Length = c( 51, 66, 48)
)
flower_w_mean<- 27
flower_l_mean<- 55
cov_matrix<- matrix(c(39, 0, 0, 93), nrow=2, ncol=2)
apply(flower, 1, FUN = density_fn(flower$Width, flower$Length,
flower_w_mean, flower_l_mean, cov_matrix))
Originally, I got this error:
Error in -0.5 % * % t(mean_vec) % * % sigma_inv : non-conformable arguments
I thought it was an issue with my covariance matrix, so I took out everything but the first line of the function and returned mean_vec
and that's when I got this error:
Error in match.fun(FUN) : c("'density_fn(flower$Width, flower$Length, flower_w_mean, flower_l_mean, ' is not a function, character or symbol", "' cov_matrix)' is not a function, character or symbol")
Anyone know how to properly apply this function to a data frame?
Upvotes: 1
Views: 357
Reputation: 2944
First, let me explain two error messages that you got.
The first error
Error in -0.5 % * % t(mean_vec) % * % sigma_inv : non-conformable arguments
This error message shows that the cause of the error is the multiplication of arguments not conformable to the rule. The rule of the matrix multiplication A %*% B
is that the number of column(s) of A
if it's a matrix, or the length of A
if it's a vector, has to equal the number of row(s) of B
.
In density_fn
the definition mean_vec<- matrix(c((x - mu_x), (y - mu_y)))
makes mean_vec
a matrix that has exactly 1 column and the number of rows equal to the total length of x
combined with y
. Consequently, if the number of rows of sigma_inv
> 1,
the matrix multiplication t(mean_vec) %*% sigma_inv
doesn't conform to the rule and will result in error. For example:
x <- 1:2 # length of 2
y <- 1:3 # length of 3
sigma <- matrix(1:4, nrow = 2)
sigma_inv <- solve(sigma)
mean_vec <- matrix(c(x - mean(x), y - mean(y)))
mean_vec # 1 row 5 columns
# [,1]
#[1,] -0.5
#[2,] 0.5
#[3,] -1.0
#[4,] 0.0
#[5,] 1.0
t(mean_vec) %*% sigma_inv
# Error in t(mean_vec) %*% sigma_inv : non-conformable arguments
It will not result in error if each of x
and y
has the length of 1. That's why this function does work when you use it for a single row. For example:
x <- 2 # length of 1
y <- 3 # length of 1
sigma <- matrix(1:4, nrow = 2)
mean_vec <- matrix(c(x - mean(x), y - mean(y)))
sigma_inv <- solve(sigma)
t(mean_vec) %*% sigma_inv
# [,1] [,2]
# [1,] 0 0
An alternative way to get the function to work properly for any length of x
and y
is to set the number of row and the number of column in mean_vec <- matrix()
as follows:
mean_vec <- matrix(c((x - mu_x), (y - mu_y)),
ncol = nrow(sigma),
nrow = ncol(sigma))
and then to change the matrix multiplication -0.5 %*% mean_vec
to a scalar multiplication -0.5 * mean_vec
.
So, the function becomes:
density_fn <- function(x, y, mu_x, mu_y, sigma) {
mean_vec <- matrix(c((x - mu_x), (y - mu_y)),
ncol = nrow(sigma),
nrow = ncol(sigma))
sigma_det <- det(sigma)
sigma_inv <- solve(sigma)
frac <- 1 / (2 * pi * sqrt(sigma_det))
exponent <- exp(-0.5 * t(mean_vec) %*% sigma_inv %*% mean_vec)
den_fn <- frac * exponent
return(den_fn)
}
The second error
Error in match.fun(FUN) : c("'density_fn(flower$Width, flower$Length, flower_w_mean, flower_l_mean, ' is not a function, character or symbol", "' cov_matrix)' is not a function, character or symbol")
This error message shows that the value of FUN
in apply
is not properly specified. According to the documentation of apply
, the value to be assigned to FUN
should be
typically is either a function or a symbol (e.g., a backquoted name) or a character string specifying a function to be searched for from the environment of the call to apply
It means you should mention only the function name to FUN
. The additional arguments of the function (the 2nd argument, the 3rd one,...) should be mentioned after FUN
, not within FUN
. Please check ?apply
for details.
However, this step alone does not solve the problem because apply
is suitable for a univariate input. Because your input is multivariate, mapply
is more suitable. Other options include Map
and loops using for
etc.
How to apply functions using mapply
It is best explained by a simple example. Please check ?mapply
for details. Suppose you have x
and y
and you want to get z = 2x + 3y
. You want to vectorize the function to x
and y
.
x <- c(3,4,5)
y <- c(10,20, 30)
myfun <- function(x,y) 2*x + 3*y
z <- mapply(myfun, x, y)
z
#[1] 36 68 100
If you have arguments other than x
and y
in myfun
, you should assign them to MoreArgs
. Here is the example using density_fn
.
mapply(density_fn, x = flower$Width, y = flower$Length, MoreArgs = list(
mu_x = flower_w_mean,
mu_y = flower_l_mean,
sigma = cov_matrix))
# [,1] [,2] [,3]
# [1,] 0.001293784 0.001000747 0.001929141
# [2,] 0.001293784 0.001000747 0.001929141
# [3,] 0.001293784 0.001000747 0.001929141
# [4,] 0.001293784 0.001000747 0.001929141
These steps do not produce errors. However, because of my limited knowledge on the subject, I do not guarantee that these steps properly represent the density function of bivariate normal distribution as you intended.
Upvotes: 0