Iman
Iman

Reputation: 829

Calculate percentage in Dataframe according to a condition

Assume I have the following Dataframe. I need to count the percentage of ages under 18, grouped by ID and Group. What I need is for example for 1 a 50% or for 3 a 0% I could do it in two steps by counting all and counting under 18 ages, then merge these two frames toghether, but I want to know if I could do it in one step.

a <- group_by(ID, Group ) %>% summarize(countAllData = n())
b <- group_by(ID, Group ) %>% filter(lebensalter < 18) %>%     summarize(countUnder18 = n())
merge(a, b, by=c("ID", "Group"), all=TRUE)
final[is.na(final)] <- 0 
percentageUnder18 = ((final$countUnder18/final$countAllData) * 100)
cbind(final, roundedPercentage)

Any suggestion?

ID Group Age
1  a      20
1  a      17 
1  b      16
2  c      23
2  c      11
2  d      12
3  e      20

Upvotes: 0

Views: 90

Answers (2)

G. Grothendieck
G. Grothendieck

Reputation: 269311

Take the mean of the indicator variable Age < 18. The last line is optional but the output in this example looks a bit better if you use it.

library(dplyr)

DF %>% 
   group_by(ID, Group) %>% 
   summarize("%Under18" = round(100 * mean(Age < 18))) %>% 
   ungroup %>%
   as.data.frame

giving:

  ID Group %Under18
1  1     a       50
2  1     b      100
3  2     c       50
4  2     d      100
5  3     e        0

Note

The input in reproducible form:

Lines <- "
ID Group Age
1  a      20
1  a      17 
1  b      16
2  c      23
2  c      11
2  d      12
3  e      20"
DF <- read.table(text = Lines, header = TRUE)

Upvotes: 1

Sotos
Sotos

Reputation: 51582

You can use aggregate, i.e.

aggregate(Age ~ ID+Group, df, FUN = function(i) sum(i < 18)/length(i))

which gives,

   ID Group Age
1  1     a 0.5
2  1     b 1.0
3  2     c 0.5
4  2     d 1.0
5  3     e 0.0

Upvotes: 1

Related Questions