Reputation: 177
I would like to calculate the percentiles of the following tibble...
I have a non-zero subset of 10 observations in each of 3 variables i.e...
n <- 10
tibb <- tibble(
x = 1:5,
y = 1,
z = x ^ 2 + y)
(The excluded observations are all zero)
Therefore the mean is the sum of the fields / 10 (as opposed to / 5):
meantibb <- tibb %>% group_by() %>%
summarise_if(is.numeric, sum, na.rm = TRUE) / n
meantibb
How do I get the following percentiles of x, y and z in the tibble please?
perciles <- c(0.5, 0.75)
percentiles <- function(p) quantile(p, perciles)
Thank you
Upvotes: 2
Views: 476
Reputation: 56
You could create a data set including the zeroes
missingRowCount <- n - nrow(tibb)
colCount <- ncol(tibb)
zeroTibb <- matrix(rep(0, missingRowCount * colCount), ncol = colCount, nrow = missingRowCount) %>% as.tibble()
colnames(zeroTibb) <- colnames(tibb)
allTibb <- dplyr::bind_rows(tibb, zeroTibb)
Once you have the full data you can run the following to get a tibble of percentiles
percTibble = sapply(allTibb, percentiles) %>%
as.tibble()
The assumption here is that the data is not going to be too large when the zeroes are included.
Upvotes: 2
Reputation: 93
You're close, your method of creating the mean (and subsequently the percentiles) could be simpler if you use gather first and then group the data by the three different factors.
library(dplyr)
n <- 10
tibb <- tibble(x = 1:5, y = 1, z = x ^ 2 + y)
tibb %>%
gather("fctr", "value") %>%
group_by(fctr) %>%
summarise(mean = sum(value) / n,
perc_50 = quantile(value, 0.5),
perc_75 = quantile(value, 0.75))
However, I'm not sure if you want the percentile of the non-zero subset or the entire dataset, because this will change your outcomes, i.e.
> x = 1:5
> quantile(x, 0.1)
10%
1.4
> test <- c(0,0,0,0,0,1,2,3,4,5)
> quantile(test, 0.1)
10%
0
Upvotes: 1