equanimity
equanimity

Reputation: 2533

Percentile Data in Tabular Form in R

I have a dataframe (df) that looks as follows:

Date        Model   Color   Value  Samples
6/19/17     Gold    Blue    0.50   500
6/19/17     Gold    Red     1.25   449
6/19/17     Silver  Blue    0.75   1320
6/19/17     Silver  Blue    1.50   103
6/19/17     Gold    Red     0.70   891
6/19/17     Gold    Blue    0.41   18103
6/19/17     Copper  Blue    0.83   564

I can output the percentile data for each Color variable using the following:

df_subset <- subset(df, df$Color == 'Blue')
quantile(df_subset$Value, c(0.50, 0.99, 0.999, 0.9999))

But, how would I do the following?

a) add the Samples column to the output

b) add multiple rows to the output (i.e. one row for each distinct value of the Model variable)

An example would be as follows:

            |  # Samples  |  50th percentile |  99th percentile |  99.9th percentile |  99.99th percentile
Gold
Silver
Copper

Thanks in advance for your help!

Upvotes: 0

Views: 1262

Answers (2)

Eric Watt
Eric Watt

Reputation: 3230

library(data.table)

dat <- data.table(Date = "6/19/17",
                  Model = c("Gold", "Gold", "Silver", "Silver", "Gold", "Gold", "Copper"),
                  Color = c("Blue", "Red", "Blue", "Blue", "Red", "Blue", "Blue"),
                  Value = c(0.5, 1.25, .75, 1.5, .7, .41, .83),
                  Samples = c(500, 449, 1320, 103, 891, 18103, 564))

dat[, .(Samples = sum(Samples),
        `50th percentile` = quantile(Value, probs = c(0.5)),
        `99th percentile` = quantile(Value, probs = c(0.99)),
        `99.9th percentile` = quantile(Value, probs = c(0.999)),
        `99.99th percentile` = quantile(Value, probs = c(0.9999))), 
    by = Model]

Results in:

    Model Samples 50th percentile 99th percentile 99.9th percentile 99.99th percentile
1:   Gold   19943           0.600          1.2335           1.24835           1.249835
2: Silver    1423           1.125          1.4925           1.49925           1.499925
3: Copper     564           0.830          0.8300           0.83000           0.830000

Upvotes: 1

www
www

Reputation: 39154

A solution using dplyr. dt2 is the final output.

dt <- read.table(text = "Date        Model   Color   Value  Samples
6/19/17     Gold    Blue    0.50   500
                 6/19/17     Gold    Red     1.25   449
                 6/19/17     Silver  Blue    0.75   1320
                 6/19/17     Silver  Blue    1.50   103
                 6/19/17     Gold    Red     0.70   891
                 6/19/17     Gold    Blue    0.41   18103
                 6/19/17     Copper  Blue    0.83   564",
                 header = TRUE, stringsAsFactors = FALSE)

library(dplyr)

dt2 <- dt %>%
  group_by(Model) %>%
  summarise(Samples = sum(Samples),
            `50th percentile` = quantile(Value, 0.5),
            `99th percentile` = quantile(Value, 0.99),
            `99.9th percentile` = quantile(Value, 0.999),
            `99.99th percentile` = quantile(Value, 0.9999))

Upvotes: 0

Related Questions