tumultous_rooster
tumultous_rooster

Reputation: 12580

mutate_each in dplyr not working column-wise over dataframe

In this case, I'm attempting to apply the quantile function to this example dataframe:

DF <- as.data.frame(matrix(runif(9, 1, 10),ncol=3,nrow=3))

DF_of_quantiles <- DF %>% 
  mutate_each(funs(quantile(DF,c(0.98), na.rm=TRUE)))

But mutate_each does not perform the function over the columns:

View(DF_of_quantiles)

gives

    V1  V2  V3
1   9.822732    9.822732    9.822732
2   9.822732    9.822732    9.822732
3   9.822732    9.822732    9.822732

Notice that

View(quantile(DF,c(0.98), na.rm=TRUE)

gives the same value:

    row.names   x
1   98% 9.822732

What am I doing wrong?

Upvotes: 2

Views: 1959

Answers (2)

Rich Scriven
Rich Scriven

Reputation: 99371

When using dplyr::funs(), don't forget you need to use . as a dummy parameter in the argument you wish to pass the data into, meaning that it would be written

quantile(., 0.98, na.rm = TRUE)

inside funs. Additionally, for this operation I think you may prefer summarise_each.

library(dplyr)
summarise_each(DF, funs(quantile(., 0.98, na.rm=TRUE)))
#         V1       V2       V3
# 1 4.868255 6.937773 7.864751

If you pass DF to quantile through funs, you'll receive a result that is the same as calling quantile on the entire data frame:

summarise_each(DF, funs(quantile(DF, 0.98, na.rm=TRUE)))
#         V1       V2       V3
# 1 7.830681 7.830681 7.830681
quantile(as.matrix(DF), 0.98, names = FALSE)
# [1] 7.830681 

Which is what you are seeing as a result of your mutate_each call, but is not what you want. Also, mutate_each with . will give correct but undesirable results

mutate_each(DF, funs(quantile(., 0.98, na.rm=TRUE)))
#         V1       V2       V3
# 1 4.868255 6.937773 7.864751
# 2 4.868255 6.937773 7.864751
# 3 4.868255 6.937773 7.864751

Check:

vapply(DF, quantile, 1, 0.98)
#       V1       V2       V3 
# 4.868255 6.937773 7.864751 

Upvotes: 3

B.Mr.W.
B.Mr.W.

Reputation: 19648

In case someone came across this question and is OK with a build-in function apply.

# 2 mean column wise, 1 means row wise
> apply(DF, 2, function(x)quantile(x, 0.5))
      V1       V2       V3 
5.953192 8.144576 3.528949 

Thanks to thelatemail's suggestion, I added the output of lapply, sapply and the output.

> lapply(DF, function(x)quantile(x, 0.5))
$V1
50% 
5.953192 

$V2
50% 
8.144576 

$V3
50% 
3.528949 

> sapply(DF, function(x)quantile(x, 0.5))
V1.50%   V2.50%   V3.50% 
5.953192 8.144576 3.528949 

Upvotes: 3

Related Questions