Reputation: 829
Assume I have the following Dataframe. I need to count the percentage of ages under 18, grouped by ID and Group. What I need is for example for 1 a 50% or for 3 a 0% I could do it in two steps by counting all and counting under 18 ages, then merge these two frames toghether, but I want to know if I could do it in one step.
a <- group_by(ID, Group ) %>% summarize(countAllData = n())
b <- group_by(ID, Group ) %>% filter(lebensalter < 18) %>% summarize(countUnder18 = n())
merge(a, b, by=c("ID", "Group"), all=TRUE)
final[is.na(final)] <- 0
percentageUnder18 = ((final$countUnder18/final$countAllData) * 100)
cbind(final, roundedPercentage)
Any suggestion?
ID Group Age
1 a 20
1 a 17
1 b 16
2 c 23
2 c 11
2 d 12
3 e 20
Upvotes: 0
Views: 90
Reputation: 269311
Take the mean of the indicator variable Age < 18
. The last line is optional but the output in this example looks a bit better if you use it.
library(dplyr)
DF %>%
group_by(ID, Group) %>%
summarize("%Under18" = round(100 * mean(Age < 18))) %>%
ungroup %>%
as.data.frame
giving:
ID Group %Under18
1 1 a 50
2 1 b 100
3 2 c 50
4 2 d 100
5 3 e 0
The input in reproducible form:
Lines <- "
ID Group Age
1 a 20
1 a 17
1 b 16
2 c 23
2 c 11
2 d 12
3 e 20"
DF <- read.table(text = Lines, header = TRUE)
Upvotes: 1
Reputation: 51582
You can use aggregate
, i.e.
aggregate(Age ~ ID+Group, df, FUN = function(i) sum(i < 18)/length(i))
which gives,
ID Group Age
1 1 a 0.5
2 1 b 1.0
3 2 c 0.5
4 2 d 1.0
5 3 e 0.0
Upvotes: 1