Werner Hertzog
Werner Hertzog

Reputation: 2022

Counting incidences from one data frame, entering results into a different data frame

I have two data frames: households and individuals.

This is households:

structure(list(ID = 1:5), class = "data.frame", row.names = c(NA, 
-5L))

This is individuals:

structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 
3L, 4L, 4L, 4L, 4L, 5L, 5L), Yesno = c(1L, 0L, 1L, 0L, 0L, 0L, 
1L, 1L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L)), class = "data.frame", row.names = c(NA, 
-17L))

I'm trying to to add a new column to households that counts the number of times variable Yesno is equal to 1, grouping results by ID.

I have tried

households$Count <- as.numeric(ave(individuals$Yesno[individuals$Yesno == 1], households$ID, FUN = count))

households should look like this:

ID  Count
1   2
2   3
3   0
4   2
5   1

Upvotes: 1

Views: 54

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 388982

Another base R approach using sapply is to loop over each ID in households and subset that ID from individuals and count how many of them have 1 in Yesno column.

households$Count <- sapply(households$ID, function(x) 
                   sum(individuals$Yesno[individuals$ID == x] == 1))

households
#  ID Count
#1  1     2
#2  2     3
#3  3     0
#4  4     2
#5  5     1

The == 1 part in the function can be removed if the Yesno column has only 0's and 1's.

Upvotes: 2

Maurits Evers
Maurits Evers

Reputation: 50678

Option 1: In base R

Using merge and aggregate

aggregate(Yesno ~ ID, merge(households, individuals), FUN = sum)
#  ID Yesno
#1  1     2
#2  2     3
#3  3     0
#4  4     2
#5  5     1

Option 2: With dplyr

Using left_join and group_by+summarise

library(dplyr)
left_join(households, individuals) %>%
    group_by(ID) %>%
    summarise(Count = sum(Yesno))
#Joining, by = "ID"
## A tibble: 5 x 2
#     ID Count
#  <int> <int>
#1     1     2
#2     2     3
#3     3     0
#4     4     2
#5     5     1

Option 3: With data.table

library(data.table)
setDT(households)
setDT(individuals)
households[individuals, on = "ID"][, .(Count = sum(Yesno)), by = ID]
#   ID Count
#1:  1     2
#2:  2     3
#3:  3     0
#4:  4     2
#5:  5     1

Sample data

households <- structure(list(ID = 1:5), class = "data.frame", row.names = c(NA,
-5L))

individuals <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L,
3L, 4L, 4L, 4L, 4L, 5L, 5L), Yesno = c(1L, 0L, 1L, 0L, 0L, 0L,
1L, 1L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L)), class = "data.frame", row.names = c(NA,
-17L))

Upvotes: 5

Related Questions