keherder
keherder

Reputation: 65

Create New Dataframe of Summary (90th Percentile) Statistics for Multiple Rows of All Columns in R

I am working on a school project and have a data set of 4,000 rows. There are 40 participants and each has about 100 rows. I want to create a data set that collapse the rows for each participant into summary statsitics, ideally the 90th percentile. I know how to find the mean values with dplyr:

Means <- bladder %>% 
  group_by(id, group) %>% 
  summarise(across(everything(), list(mean)))

And this works great. But is there somehow I could do the same thing but instead list the 90th percentiles instead of means?

Thank you!!

Upvotes: 4

Views: 694

Answers (2)

Kumar
Kumar

Reputation: 188

the following code also gives the solution

Percentile90 <- survival::bladder %>% 
                                  group_by(id, rx) %>% 
                                  summarise(across(everything(), 
                                  quantile, probs = 0.9, na.rm = T))

Upvotes: 0

benson23
benson23

Reputation: 19097

The function to calculate percentile in R is quantile. We can specify probs = 0.9 to get 90th percentile.

Here I use the bladder dataset from the survival package to demonstrate.

library(dplyr)

survival::bladder %>% 
  group_by(id, rx) %>% 
  summarize(across(everything(), quantile, probs = 0.9, .groups = "drop"))

# A tibble: 85 × 7
      id    rx number  size  stop event  enum
   <int> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>
 1     1     1      1     3     1   0     3.7
 2     2     1      2     1     4   0     3.7
 3     3     1      1     1     7   0     3.7
 4     4     1      5     1    10   0     3.7
 5     5     1      4     1    10   0.7   3.7
 6     6     1      1     1    14   0     3.7
 7     7     1      1     1    18   0     3.7
 8     8     1      1     3    18   0.7   3.7
 9     9     1      1     1    18   1     3.7
10    10     1      3     3    23   0     3.7
# … with 75 more rows

Upvotes: 4

Related Questions