Reputation: 1
So I have two dataframes -
person.data
person_id | date | present_absent
1 12/9 p
2 12/9 a
3 12/9 p
1 20/9 a
2 20/9 a
3 20/9 a
...
daily.data
date | week_code | absence_count
12/9 A
20/9 B
...
I need to work out for each date in the daily.data frame how many 'a' codes are recorded in the person.data frame and record the result in the absence_count column of the daily.data frame.
Part of the problem is I don't know how to properly word the question without drawing an example. My last 4 hours of google and stackoverflow searches have showed me how to count the number of 'a' codes for a specific date using:
sum(person.data$date == 12/9 & person.data$present_absent == "a")
but I can't figure out how to get it to compare person.data$date with each daily.data$date. I could fall back on Excel and use a COUNTIFS() formula but I really want to understand R.
Any help at all would be greatly received, thanks in advance.
Upvotes: 0
Views: 1338
Reputation: 611
You can use the aggregate and %in% function in R.
# Your first dataset
person.data <- data.frame("persion_id" = c(1,2,3,1,2,3),
"date" = c("12/9", "12/9", "12/9","20/9", "20/9", "20/9"),
"present_absent" = c("p", "a", "p", "a", "a", "a"))
# Your second dataset you want to populate
daily.data <- data.frame("date" = c("12/9", "20/9"),
"week_code" = c("A", "B"))
# Summarize the attendance for all dates
summary <- aggregate(as.integer(person.data$present_absent),
by = list(person.data$date, person.data$present_absent), FUN = sum)
# Get only the absent records from summary
daily.data$absence_count <- summary[(daily.data$date %in% person.data$date &
summary[,2] == "a"), 3]
Upvotes: 2
Reputation: 146164
A dplyr
solution:
library(dplyr)
inner_join(person.data, daily.data) %>%
group_by(date, week_code) %>%
summarize(absence_count = sum(present_absent == 'a'))
Upvotes: 0