Reputation: 197
I have two data frames, A and B. In B I have two key columns and for each row in B I need to count the number of rows in A matching those keys.
I know how to solve the problem using a for-loop but it's taking forever and i was wondering if there is a smarter way to do it. I am still quite new to R so you'll have to forgive me if I'm missing some obvious solution.
The data frames have the following strucuture. Of course, in reality the data frames are much larger.
A <- data.frame(c(1, 2, 1), c(2, 1, 2), c("alpha", "bravo", "charlie"))
colnames(A) <- c("key1", "key2", "value")
B <- data.frame(c(1, 2, 3), c(2, 1, 3), NA)
colnames(B) <- c("key1", "key2", "count")
I used the following for-loop and got the correct result.
for (i in 1:nrow(B)) {
B$count[i] <- sum(A$key1 == B$key1[i] & A$key2 == B$key2[i], na.rm = TRUE)
}
However, the code took a quite while to run and i suspect there is a better way to do this. I would appreciate any help!
Upvotes: 4
Views: 73
Reputation: 51592
Here is an idea using merge
,
library(tidyverse)
A %>%
full_join(B, by = c('key1', 'key2')) %>%
group_by(key1, key2) %>%
summarise(count = sum(!is.na(value)))
which gives,
# A tibble: 3 x 3 # Groups: key1 [?] key1 key2 count <dbl> <dbl> <int> 1 1 2 2 2 2 1 1 3 3 3 0
Adding the data.table
solution for completion,
library(data.table)
setDT(A)[setDT(B), on = c('key1', 'key2')][,
.(count = sum(!is.na(value))), by = c('key1', 'key2')]
# key1 key2 count
#1: 1 2 2
#2: 2 1 1
#3: 3 3 0
Upvotes: 3
Reputation: 887118
An option in base R
(similar to @Sotos tidyverse option)
aggregate(cbind(count = !is.na(value)) ~ key1 + key2, merge(A, B, all = TRUE), sum)
# key1 key2 count
#1 2 1 1
#2 1 2 2
#3 3 3 0
Or with data.table
library(data.table)
setDT(A)[B, .(count = sum(!is.na(value))), on = .(key1, key2), by = .EACHI]
# key1 key2 count
#1: 1 2 2
#2: 2 1 1
#3: 3 3 0
Upvotes: 4
Reputation: 388982
Using the same logic as you but without a loop using mapply
B$count <- mapply(function(x, y)
sum(x == A$key1 & y == A$key2, na.rm = TRUE), B$key1, B$key2)
B
# key1 key2 count
#1 1 2 2
#2 2 1 1
#3 3 3 0
Upvotes: 3