becbot
becbot

Reputation: 173

use function on multiple columns (variables) in r

I am trying to run tests of homogeneity of variance using the leveneTest function from the car package. I can run the test on a single variable like so (using the iris dataset as an example)

library(car)
library(datasets)

data(iris)

leveneTest(iris$Sepal.Length, iris$Species)

However, I would like to run the test on all the dependent variables in the dataset simultaneously (so Sepal.Length, Sepal.Width, Petal.Length, Petal.Width). I am guessing it has something to do with the apply family of functions (sapply, lapply, tapply) but I just can't figure out how. The closest I came is something like this:

lapply(iris, leveneTest(group = iris$Species))

However I get the error

Error in leveneTest.default(group = iris$Species) : 
  argument "y" is missing, with no default

Which I understand is probably because it isn't able to specify the outcome variables. I am certain I must be missing some obvious use of the apply functions, but I just don't understand what it is. Apologies for the basic question, but I am relatively new to R and am often applying the same function to multiple variables (usually by copying the code several times), so it would be great to understand how to use these functions properly :)

Upvotes: 1

Views: 1117

Answers (2)

Chris R
Chris R

Reputation: 78

Piggybacking on @Roland's answer, you can do the following in base R as well:

lapply(iris[,-5], leveneTest, group = iris$Species

the -5 is obviously specific to the iris dataset. You could replace it with a variable like

lapply(iris[,-length(iris)]....

and that would let you remove the last element of the df, assuming your grouping variable is last.

Additionally as a data.table fanboy, I'll add an option for you to use that as well, if you're interested.

dt.iris[, lapply(.SD, leveneTest, group = Species), .SDcols = !'Species']

this code enables you to 'remove' the Species column from your lapply function in a similar manner to the above base R examples, but by naming it explicitly via the .SD and .SDcols variables. Then you run your analysis in a fairly straightforward manner. Hope this helps!

Upvotes: 2

Roland
Roland

Reputation: 132651

Common parameters to the function need to be passed to ... within lapply. Like this:

lapply(subset(iris, select = -Species), leveneTest, group = iris$Species)

help("lapply") explains that ... is for "optional arguments to FUN" (meaning optional for lapply not for FUN) and provides lapply(x, quantile, probs = 1:3/4) as an example.

Upvotes: 6

Related Questions