Reputation: 8474
For a sample dataframe:
df1 <- structure(list(i.d = structure(1:9, .Label = c("a", "b", "c",
"d", "e", "f", "g", "h", "i"), class = "factor"), group = c(1L,
1L, 2L, 1L, 3L, 3L, 2L, 2L, 1L), cat = c(0L, 0L, 1L, 1L, 0L,
0L, 1L, 0L, NA)), .Names = c("i.d", "group", "cat"), class = "data.frame", row.names = c(NA,
-9L))
I wish to add an additional column to my dataframe ("pc.cat") which records the percentage '1s' in column cat BY the group ID variable.
For example, there are four values in group 1 (i.d's a, b, d and i). Value 'i' is NA so this can be ignored for now. Only one of the three values left is one, so the percentage would read 33.33 (to 2 dp). This value will be populated into column 'pc.cat' next to all the rows with '1' in the group (even the NA columns). The process would then be repeated for the other groups (2 and 3).
If anyone could help me with the code for this I would greatly appreciate it.
Upvotes: 2
Views: 525
Reputation: 21497
library(data.table)
setDT(df1)
df1[!is.na(cat), mean(cat), by=group]
Upvotes: 2
Reputation: 6362
With data.table:
library(data.table)
DT <- data.table(df1)
DT[, list(sum(na.omit(cat))/length(cat)), by = "group"]
Upvotes: 1
Reputation: 44309
This can be accomplished with the ave
function:
df1$pc.cat <- ave(df1$cat, df1$group, FUN=function(x) 100*mean(na.omit(x)))
df1
# i.d group cat pc.cat
# 1 a 1 0 33.33333
# 2 b 1 0 33.33333
# 3 c 2 1 66.66667
# 4 d 1 1 33.33333
# 5 e 3 0 0.00000
# 6 f 3 0 0.00000
# 7 g 2 1 66.66667
# 8 h 2 0 66.66667
# 9 i 1 NA 33.33333
Upvotes: 2