KT_1
KT_1

Reputation: 8494

Create new df and record percentage by groups in R

For a sample dataframe:

df <- structure(list(name = c("a", "b", "c", "d", "e", "f", "g", "h", 
"i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", 
"v", "w", "x", "y", "z", "a", "b", "c", "d", "e", "f", "g", "h", 
"i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", 
"v", "w", "x", "y", "z", "a", "b", "c", "d", "e", "f", "g", "h", 
"i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", 
"v", "w", "x", "y", "z"), amount = c(11L, 9L, 5L, 13L, 15L, 16L, 
2L, 5L, 6L, 8L, 9L, 15L, 16L, 17L, 13L, 11L, 10L, 9L, 8L, 7L, 
6L, 8L, 15L, 16L, 15L, 9L, 8L, 7L, 6L, 5L, 18L, 16L, 1L, 14L, 
15L, 13L, 12L, 11L, 10L, 9L, 8L, 5L, 6L, 9L, 10L, 12L, 13L, 6L, 
8L, 15L, 16L, 15L, 9L, 8L, 7L, 6L, 5L, 18L, 16L, 1L, 14L, 15L, 
13L, 12L, 11L, 10L, 9L, 13L, 15L, 16L, 17L, 18L, 19L, 20L, 22L, 
17L, 16L, 8L), decile = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 
10L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 1L, 2L, 3L, 4L, 
5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 1L, 2L, 3L, 
4L, 5L, 6L, 7L, 8L, 9L, 10L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 
3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 
8L, 9L, 10L, 1L, 2L, 3L, 4L, 5L, 6L), time = c(2016L, 2016L, 
2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 
2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 
2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 
2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 
2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 
2017L, 2017L, 2017L, 2017L, 2017L, 2018L, 2018L, 2018L, 2018L, 
2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 
2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 
2018L, 2018L, 2018L, 2018L)), .Names = c("name", "amount", "decile", 
"time"), row.names = c(NA, -78L), class = c("tbl_df", "tbl", 
"data.frame"), spec = structure(list(cols = structure(list(name = structure(list(), class = c("collector_character", 
"collector")), amount = structure(list(), class = c("collector_integer", 
"collector")), decile = structure(list(), class = c("collector_integer", 
"collector")), time = structure(list(), class = c("collector_integer", 
"collector"))), .Names = c("name", "amount", "decile", "time"
)), default = structure(list(), class = c("collector_guess", 
"collector"))), .Names = c("cols", "default"), class = "col_spec"))

I wish to produce a summary table detailing a count of the number of items by decile. One solution is listed below:

data.frame(table(df$decile))

Alongside this, I want to add an additional column which records the percentage of 'names' (essential rows) which have an 'amount' greater or equal to 10 BY decile.

Any ideas how to complete my coding?

Upvotes: 0

Views: 47

Answers (1)

MartijnVanAttekum
MartijnVanAttekum

Reputation: 1445

If you don't mind using tidyverse:

library(tidyverse)
df %>% group_by(decile) %>% summarize(mean(amount>=10))

    decile      `mean(amount >= 10)`
    <int>                <dbl>
 1      1                0.556
 2      2                0.444
 3      3                0.556
 4      4                0.667
 5      5                0.667
 6      6                0.667
 7      7                0.5  
 8      8                0.333
 9      9                0.667
10     10                0.667

Calculates per decile how many rows have an amount value >= 10. Is that what you want? (do you mean percentage of "rows" rather than "names"?)

Upvotes: 1

Related Questions