neves
neves

Reputation: 846

`dplyr::summarise` does not accept external functions

I have the follow dataset:

dataset=structure(list(var1 = c(28.5627505742013, 22.8311421908438, 95.2216156944633, 
43.9405107684433, 97.11211245507, 48.4108281508088, 77.1804554760456, 
27.1229329891503, 69.5863061584532, 87.2112890332937), var2 = c(32.9009465128183, 
54.1136392951012, 69.3181485682726, 70.2100433968008, 44.0986660309136, 
62.8759404085577, 79.4413498230278, 97.4315509572625, 62.2505457513034, 
76.0133410431445), var3 = c(89.6971945464611, 67.174579706043, 
37.0924087055027, 87.7977314218879, 29.3221596442163, 37.5143952667713, 
62.6237869635224, 71.3644423149526, 95.3462834469974, 27.4587387405336
), var4 = c(41.5336912125349, 98.2095112837851, 80.7970978319645, 
91.1278881691396, 66.4086666144431, 69.2618868127465, 67.7560870349407, 
71.4932355284691, 21.345994155854, 31.1811877787113), var5 = c(33.9312525652349, 
88.1815139763057, 98.4453701227903, 25.0217059068382, 41.1195872165263, 
37.0983888953924, 66.0217586159706, 23.8814191706479, 40.9594196081161, 
79.7632974945009), var6 = c(39.813664201647, 80.6405956856906, 
30.0273275375366, 34.6203793399036, 96.5195455029607, 44.5830867439508, 
78.7370151281357, 42.010761089623, 23.0079878121614, 58.0372223630548
), kmeans = structure(c(2L, 1L, 3L, 1L, 3L, 1L, 1L, 1L, 2L, 3L
), .Label = c("1", "2", "3"), class = "factor")), .Names = c("var1", 
"var2", "var3", "var4", "var5", "var6", "kmeans"), row.names = c(NA, 
-10L), class = c("tbl_df", "tbl", "data.frame"))

And the follow function:

myfun<-function(x){
  c(sum(x),mean(x),sd(x))
}

With dplyr::summarise only, the result is ok:

library(tidyverse)

my1<-dataset%>%
  summarise_if(.,is.numeric,.funs=funs(sum,mean,sd))

But, with myfun doesn't work:

my2<-dataset%>%
  summarise_if(.,is.numeric,.funs=funs(myfun))

Error in summarise_impl(.data, dots) : Column var1 must be length 1 (a summary value), not 3

What's the problem?

Upvotes: 1

Views: 110

Answers (3)

JMueller
JMueller

Reputation: 123

it's not the most elegant way, but if your external function is just a list of other functions, maybe you can just use a list for your functions:

myfun_ls <- list(sum,mean,sd)
my2<-dataset%>%
  summarise_if(.,is.numeric,.funs=myfun_ls)

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388907

When you are applying this function

dataset%>% summarise_if(is.numeric,.funs=funs(sum,mean,sd))

You are applying three different function (sum, mean and sd) which is applied to all columns individually. So every column which is numeric these function would be applied to them. Here we have got three different function returning three values.

Regarding your function, I think what you were trying to do was

myfun<-function(x){
  c(sum(x),mean(x),sd(x))
}

Now , when this function is applied to one column it returns you three values, so here one function is returning you three values instead.

myfun(dataset$var1)
#[1] 597.17994  59.71799  29.03549

As @NelsonGon mentioned in the comments, you are trying to store three values in single column. You could return them as list as @Pkumar showed or some variation of do also would help you achieve that. If you break down the functions and make three functions separately, it would work the same way as you have shown earlier.

myfun1 <- function(x) sum(x)
myfun2  <- function(x) mean(x)
myfun3 <- function(x) sd(x)

dataset %>% summarise_if(is.numeric,.funs=funs(myfun1,myfun2,myfun3))

Upvotes: 2

PKumar
PKumar

Reputation: 11128

You can try this approach, Your approach will not yield the correct result as there it is not able to wrap two values returned by your custom function in a single cell, to circumvent the problem, I used enframe with list in the custom function:

library(tidyverse)

myfun<-function(x){
    return(list(enframe(c('sum' = sum(x),'mean' = mean(x),'sd' = sd(x)))))
}

For example with mtcars data:

my2<-mtcars%>%
summarise_at(c('mpg','drat'), function(x) myfun(x)) %>% 
unnest() %>% 
select(-name1) %>% 
set_names(nm = c('name', 'mpg', 'drat'))

it will yield:

  name        mpg        drat
1  sum 642.900000 115.0900000
2 mean  20.090625   3.5965625
3   sd   6.026948   0.5346787

Also, there is one alternate way in which you can try solving it using purrr.

For example:

f <- function(x,...){
    list('mean' = mean(x, ...),'sum' = sum(x, ...))
}

mtcars %>% 
select(mpg, drat) %>% 
map_dfr(~ f(.x, na.rm=T), .id ="Name") %>% 
data.frame()

Upvotes: 3

Related Questions