den
den

Reputation: 169

R count and substract events from a data frame

I am trying to calculate the families sizes from a data frame, which also contains two types of events : family members who died, and those who left the family. I would like to take into account these two parameters in order to compute the actual family size. Here is a reproductive example of my problem, with 3 families only :

family <- factor(rep(c("001","002","003"), c(10,8,15)), levels=c("001","002","003"), labels=c("001","002","003"), ordered=TRUE)
dead <- c(0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0)
left <- c(0,0,0,0,0,1,0,0,0,1,1,0,0,0,1,1,0,0,0,1,1,1,0,0,0,0,0,0,1,1,1,0,0)
DF <- data.frame(family, dead, left)  ; DF

I could count N = total family members (in each family) in a second dataframe DF2, by simply using table()

DF2 <- with(DF, data.frame(table(family)))
colnames(DF2)[2] <- "N"   ; DF2
family  N
1    001 10
2    002  8
3    003 15

But i can not find a proper way to get the actual number of people (for example, creating a new variable N2 into DF2) , calculated by substracting to N the number of members who died or left the family. I suppose i have to relate the two dataframes DF and DF2 in a way. i have looked for other related questions in this site but could not find the right answer... If anyone has a good idea, it would be great ! Thank you in advance.. Deni

Upvotes: 2

Views: 316

Answers (3)

den
den

Reputation: 169

I finally found another which works fine (from another post), allowing to compute everything from the original DF table. This uses the ddply function :

DF <- ddply(DF,.(family),transform,total=length(family)) DF <- ddply(DF,.(family),transform,actual=length(family)-sum(dead=="1")-sum(left=="1")) DF

Thanks a lot to everyone who helped ! Deni

Upvotes: 0

joel.wilson
joel.wilson

Reputation: 8413

Logic : First we want to group_by(family) and then calculate 2 numbers : i) total #obs in each group ii) subtract the sum(dead) + sum(left) from this total .

In dplyr package : n() helps us get the total #observations in each group

In data.table : .N does the same above job

library(dplyr)
DF %>% group_by(family) %>% summarise( total = n(), current = n()-sum(dead,left, na.rm = TRUE))
#  family total current
#  (fctr) (int)   (dbl)
#1    001    10       6
#2    002     8       4
#3    003    15       7


library(data.table)
# setDT() is preferred if incase your data was a data.frame. else just DF.
setDT(DF)[, .(total = .N, current = .N - sum(dead, left, na.rm = TRUE)), by = family]
#   family total current
#1:    001    10       6
#2:    002     8       4
#3:    003    15       7

Upvotes: 2

akrun
akrun

Reputation: 887251

Here is a base R option

do.call(data.frame, aggregate(dl~family, transform(DF, dl = dead + left), 
      FUN = function(x) c(total=length(x), current=length(x) - sum(x))))

Or a modified version is

transform(aggregate(. ~ family, transform(DF, total = 1,
  current = dead + left)[c(1,4:5)], FUN = sum), current = total - current)
#     family total current
#1    001    10       6
#2    002     8       4
#3    003    15       7

Upvotes: 2

Related Questions