Reputation: 846
I have the follow dataset:
dataset=structure(list(var1 = c(28.5627505742013, 22.8311421908438, 95.2216156944633,
43.9405107684433, 97.11211245507, 48.4108281508088, 77.1804554760456,
27.1229329891503, 69.5863061584532, 87.2112890332937), var2 = c(32.9009465128183,
54.1136392951012, 69.3181485682726, 70.2100433968008, 44.0986660309136,
62.8759404085577, 79.4413498230278, 97.4315509572625, 62.2505457513034,
76.0133410431445), var3 = c(89.6971945464611, 67.174579706043,
37.0924087055027, 87.7977314218879, 29.3221596442163, 37.5143952667713,
62.6237869635224, 71.3644423149526, 95.3462834469974, 27.4587387405336
), var4 = c(41.5336912125349, 98.2095112837851, 80.7970978319645,
91.1278881691396, 66.4086666144431, 69.2618868127465, 67.7560870349407,
71.4932355284691, 21.345994155854, 31.1811877787113), var5 = c(33.9312525652349,
88.1815139763057, 98.4453701227903, 25.0217059068382, 41.1195872165263,
37.0983888953924, 66.0217586159706, 23.8814191706479, 40.9594196081161,
79.7632974945009), var6 = c(39.813664201647, 80.6405956856906,
30.0273275375366, 34.6203793399036, 96.5195455029607, 44.5830867439508,
78.7370151281357, 42.010761089623, 23.0079878121614, 58.0372223630548
), kmeans = structure(c(2L, 1L, 3L, 1L, 3L, 1L, 1L, 1L, 2L, 3L
), .Label = c("1", "2", "3"), class = "factor")), .Names = c("var1",
"var2", "var3", "var4", "var5", "var6", "kmeans"), row.names = c(NA,
-10L), class = c("tbl_df", "tbl", "data.frame"))
And the follow function:
myfun<-function(x){
c(sum(x),mean(x),sd(x))
}
With dplyr::summarise
only, the result is ok:
library(tidyverse)
my1<-dataset%>%
summarise_if(.,is.numeric,.funs=funs(sum,mean,sd))
But, with myfun
doesn't work:
my2<-dataset%>%
summarise_if(.,is.numeric,.funs=funs(myfun))
Error in summarise_impl(.data, dots) : Column
var1
must be length 1 (a summary value), not 3
What's the problem?
Upvotes: 1
Views: 110
Reputation: 123
it's not the most elegant way, but if your external function is just a list of other functions, maybe you can just use a list for your functions:
myfun_ls <- list(sum,mean,sd)
my2<-dataset%>%
summarise_if(.,is.numeric,.funs=myfun_ls)
Upvotes: 1
Reputation: 388907
When you are applying this function
dataset%>% summarise_if(is.numeric,.funs=funs(sum,mean,sd))
You are applying three different function (sum
, mean
and sd
) which is applied to all columns individually. So every column which is numeric these function would be applied to them. Here we have got three different function returning three values.
Regarding your function, I think what you were trying to do was
myfun<-function(x){
c(sum(x),mean(x),sd(x))
}
Now , when this function is applied to one column it returns you three values, so here one function is returning you three values instead.
myfun(dataset$var1)
#[1] 597.17994 59.71799 29.03549
As @NelsonGon mentioned in the comments, you are trying to store three values in single column. You could return them as list as @Pkumar showed or some variation of do
also would help you achieve that. If you break down the functions and make three functions separately, it would work the same way as you have shown earlier.
myfun1 <- function(x) sum(x)
myfun2 <- function(x) mean(x)
myfun3 <- function(x) sd(x)
dataset %>% summarise_if(is.numeric,.funs=funs(myfun1,myfun2,myfun3))
Upvotes: 2
Reputation: 11128
You can try this approach, Your approach will not yield the correct result as there it is not able to wrap two values returned by your custom function in a single cell, to circumvent the problem, I used enframe
with list
in the custom function:
library(tidyverse)
myfun<-function(x){
return(list(enframe(c('sum' = sum(x),'mean' = mean(x),'sd' = sd(x)))))
}
For example with mtcars
data:
my2<-mtcars%>%
summarise_at(c('mpg','drat'), function(x) myfun(x)) %>%
unnest() %>%
select(-name1) %>%
set_names(nm = c('name', 'mpg', 'drat'))
it will yield:
name mpg drat
1 sum 642.900000 115.0900000
2 mean 20.090625 3.5965625
3 sd 6.026948 0.5346787
Also, there is one alternate way in which you can try solving it using purrr
.
For example:
f <- function(x,...){
list('mean' = mean(x, ...),'sum' = sum(x, ...))
}
mtcars %>%
select(mpg, drat) %>%
map_dfr(~ f(.x, na.rm=T), .id ="Name") %>%
data.frame()
Upvotes: 3