KT_1
KT_1

Reputation: 8474

Give percentage by group in R

For a sample dataframe:

df1 <- structure(list(i.d = structure(1:9, .Label = c("a", "b", "c", 
                                                  "d", "e", "f", "g", "h", "i"), class = "factor"), group = c(1L, 
                                                                                                              1L, 2L, 1L, 3L, 3L, 2L, 2L, 1L), cat = c(0L, 0L, 1L, 1L, 0L, 
                                                                                                                                                       0L, 1L, 0L, NA)), .Names = c("i.d", "group", "cat"), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                                                                                                -9L))

I wish to add an additional column to my dataframe ("pc.cat") which records the percentage '1s' in column cat BY the group ID variable.

For example, there are four values in group 1 (i.d's a, b, d and i). Value 'i' is NA so this can be ignored for now. Only one of the three values left is one, so the percentage would read 33.33 (to 2 dp). This value will be populated into column 'pc.cat' next to all the rows with '1' in the group (even the NA columns). The process would then be repeated for the other groups (2 and 3).

If anyone could help me with the code for this I would greatly appreciate it.

Upvotes: 2

Views: 525

Answers (3)

Rentrop
Rentrop

Reputation: 21497

library(data.table)

setDT(df1)


df1[!is.na(cat), mean(cat), by=group]

Upvotes: 2

Chris
Chris

Reputation: 6362

With data.table:

library(data.table)
DT <- data.table(df1)
DT[, list(sum(na.omit(cat))/length(cat)), by = "group"]

Upvotes: 1

josliber
josliber

Reputation: 44309

This can be accomplished with the ave function:

df1$pc.cat <- ave(df1$cat, df1$group, FUN=function(x) 100*mean(na.omit(x)))
df1
#   i.d group cat   pc.cat
# 1   a     1   0 33.33333
# 2   b     1   0 33.33333
# 3   c     2   1 66.66667
# 4   d     1   1 33.33333
# 5   e     3   0  0.00000
# 6   f     3   0  0.00000
# 7   g     2   1 66.66667
# 8   h     2   0 66.66667
# 9   i     1  NA 33.33333

Upvotes: 2

Related Questions