Greg Martin
Greg Martin

Reputation: 343

R programming - difference between using lapply and a simple function

I'm not sure that I understand the different outputs in these two scenarios:

(1)

 pioneers <- c("GAUSS:1777", "BAYES:1702", "PASCAL:1623", "PEARSON:1857")
 split <- strsplit(pioneers, split = ":")
 split

(2)

pioneers <- c("GAUSS:1777", "BAYES:1702", "PASCAL:1623", "PEARSON:1857")
split <- lapply(pioneers, strsplit, split = ":")
split

In both cases, the output is a list but I'm not sure when I'd use the one notation (simply applying a function to a vector) or the other (using lapply to loop the function over the vector).

Thanks for the help.

Greg

Upvotes: 0

Views: 156

Answers (1)

Akhil Nair
Akhil Nair

Reputation: 3284

To me it's to do with how the output is returned. [l]apply stands for list apply - i.e. the output is returned as a list. strsplit already returns a list as, if there were multiple :s in your pioneers vector, it's the only data structure that makes sense - i.e. a list element of each of the 4 elements of the vector and each list element contains a vector of the split string.

So using lapply(x, strsplit, ...) will always return a list inside a list, which you probably don't want in this case.

Using lapply is useful in cases where you expect the result of the function you're applying to be a vector of an undefined or variable length. As strsplit can see this coming already, the use of lapply is redundant, so you should probably know what form you expect/want your answer to be in, and use the appropriate functions to coerce the output in to the right data structure.

To make clear, the output of the examples you gave is not the same. One is a list, one is a list of lists. The identical result would be

lapply(pioneers, function(x, split) strsplit(x, split)[[1]], split = ":")

i.e. taking the first list element of the inner list (which is only 1 element anyway) in each case.

Upvotes: 2

Related Questions