Vincent Risington
Vincent Risington

Reputation: 177

R Percentiles of data frame with non-zero subset of observations

I would like to calculate the percentiles of the following tibble...

I have a non-zero subset of 10 observations in each of 3 variables i.e...

n <- 10
tibb <- tibble(
  x = 1:5, 
  y = 1, 
  z = x ^ 2 + y)

(The excluded observations are all zero)

Therefore the mean is the sum of the fields / 10 (as opposed to / 5):

meantibb  <-  tibb %>% group_by() %>% 
  summarise_if(is.numeric,  sum, na.rm = TRUE) / n
meantibb

How do I get the following percentiles of x, y and z in the tibble please?

perciles <- c(0.5, 0.75)
percentiles <- function(p) quantile(p, perciles)

Thank you

Upvotes: 2

Views: 476

Answers (2)

papamo
papamo

Reputation: 56

You could create a data set including the zeroes

missingRowCount <- n -  nrow(tibb)
colCount <- ncol(tibb)
zeroTibb <- matrix(rep(0, missingRowCount * colCount), ncol = colCount, nrow = missingRowCount) %>% as.tibble()
colnames(zeroTibb) <- colnames(tibb)
allTibb <- dplyr::bind_rows(tibb, zeroTibb)

Once you have the full data you can run the following to get a tibble of percentiles

percTibble = sapply(allTibb, percentiles) %>%
  as.tibble()

The assumption here is that the data is not going to be too large when the zeroes are included.

Upvotes: 2

stoa
stoa

Reputation: 93

You're close, your method of creating the mean (and subsequently the percentiles) could be simpler if you use gather first and then group the data by the three different factors.

library(dplyr)
n <- 10
tibb <- tibble(x = 1:5, y = 1, z = x ^ 2 + y)
tibb %>% 
  gather("fctr", "value") %>% 
  group_by(fctr) %>% 
  summarise(mean = sum(value) / n,
            perc_50 = quantile(value, 0.5),
            perc_75 = quantile(value, 0.75))

However, I'm not sure if you want the percentile of the non-zero subset or the entire dataset, because this will change your outcomes, i.e.

> x = 1:5
> quantile(x, 0.1)
10% 
1.4 

> test <- c(0,0,0,0,0,1,2,3,4,5)
> quantile(test, 0.1)  
10% 
  0 

Upvotes: 1

Related Questions