erikfjonsson
erikfjonsson

Reputation: 197

How to perform "countif" for several rows in data frame?

I have two data frames, A and B. In B I have two key columns and for each row in B I need to count the number of rows in A matching those keys.

I know how to solve the problem using a for-loop but it's taking forever and i was wondering if there is a smarter way to do it. I am still quite new to R so you'll have to forgive me if I'm missing some obvious solution.

The data frames have the following strucuture. Of course, in reality the data frames are much larger.

A <- data.frame(c(1, 2, 1), c(2, 1, 2), c("alpha", "bravo", "charlie")) 
colnames(A) <- c("key1", "key2", "value")

B <- data.frame(c(1, 2, 3), c(2, 1, 3), NA)
colnames(B) <- c("key1", "key2", "count")

I used the following for-loop and got the correct result.

for (i in 1:nrow(B)) {
  B$count[i] <- sum(A$key1 == B$key1[i] & A$key2 == B$key2[i], na.rm = TRUE)
}

However, the code took a quite while to run and i suspect there is a better way to do this. I would appreciate any help!

Upvotes: 4

Views: 73

Answers (3)

Sotos
Sotos

Reputation: 51592

Here is an idea using merge,

library(tidyverse)

A %>% 
 full_join(B, by = c('key1', 'key2')) %>% 
 group_by(key1, key2) %>% 
 summarise(count = sum(!is.na(value)))

which gives,

# A tibble: 3 x 3
# Groups:   key1 [?]
   key1  key2 count
  <dbl> <dbl> <int>
1     1     2     2
2     2     1     1
3     3     3     0

Adding the data.table solution for completion,

library(data.table)

setDT(A)[setDT(B), on = c('key1', 'key2')][, 
          .(count = sum(!is.na(value))), by = c('key1', 'key2')]

#   key1 key2 count
#1:    1    2     2
#2:    2    1     1
#3:    3    3     0

Upvotes: 3

akrun
akrun

Reputation: 887118

An option in base R (similar to @Sotos tidyverse option)

aggregate(cbind(count = !is.na(value)) ~ key1 + key2, merge(A, B, all = TRUE), sum)
#    key1 key2 count
#1    2    1     1
#2    1    2     2
#3    3    3     0

Or with data.table

library(data.table)
setDT(A)[B, .(count = sum(!is.na(value))), on = .(key1, key2), by = .EACHI]
#   key1 key2 count
#1:    1    2     2
#2:    2    1     1
#3:    3    3     0

Upvotes: 4

Ronak Shah
Ronak Shah

Reputation: 388982

Using the same logic as you but without a loop using mapply

B$count <- mapply(function(x, y) 
      sum(x == A$key1 & y == A$key2, na.rm = TRUE), B$key1, B$key2)

B
#  key1 key2 count
#1    1    2     2
#2    2    1     1
#3    3    3     0

Upvotes: 3

Related Questions