Sam Jukes
Sam Jukes

Reputation: 1

R: Count rows in dataframe based on criteria from a different dataframe

So I have two dataframes -

person.data

person_id  |  date  | present_absent
1             12/9    p
2             12/9    a
3             12/9    p
1             20/9    a
2             20/9    a
3             20/9    a
...

daily.data

date  | week_code |  absence_count
12/9    A
20/9    B
...

I need to work out for each date in the daily.data frame how many 'a' codes are recorded in the person.data frame and record the result in the absence_count column of the daily.data frame.

Part of the problem is I don't know how to properly word the question without drawing an example. My last 4 hours of google and stackoverflow searches have showed me how to count the number of 'a' codes for a specific date using:

sum(person.data$date == 12/9 & person.data$present_absent == "a")

but I can't figure out how to get it to compare person.data$date with each daily.data$date. I could fall back on Excel and use a COUNTIFS() formula but I really want to understand R.

Any help at all would be greatly received, thanks in advance.

Upvotes: 0

Views: 1338

Answers (2)

code_is_entropy
code_is_entropy

Reputation: 611

You can use the aggregate and %in% function in R.

# Your first dataset
person.data <- data.frame("persion_id" = c(1,2,3,1,2,3),
                      "date" = c("12/9", "12/9", "12/9","20/9", "20/9", "20/9"),
                      "present_absent" = c("p", "a", "p", "a", "a", "a"))

# Your second dataset you want to populate
daily.data <- data.frame("date" = c("12/9", "20/9"),
                     "week_code" = c("A", "B"))

# Summarize the attendance for all dates
summary <- aggregate(as.integer(person.data$present_absent),
              by = list(person.data$date, person.data$present_absent), FUN = sum)

# Get only the absent records from summary
daily.data$absence_count <- summary[(daily.data$date %in% person.data$date &
                                   summary[,2] == "a"), 3]

Upvotes: 2

Gregor Thomas
Gregor Thomas

Reputation: 146164

A dplyr solution:

library(dplyr)
inner_join(person.data, daily.data) %>%
  group_by(date, week_code) %>%
  summarize(absence_count = sum(present_absent == 'a'))

Upvotes: 0

Related Questions