Reputation: 1875
I have two data.frames dfA
and dfB
. Both of them have a column called key
.
Now I'd like to know how many duplicates for A$key there are in B$key.
A <- data.frame(key=c("A", "B", "C", "D"))
B <- data.frame(key=c("A", "A", "B", "B", "B", "D"))
It should be A=2, B=3, C=0 and D=1. Whats the most easiest way to do this?
Upvotes: 1
Views: 58
Reputation: 72593
Actually you mean how many occurrences of each value of A$key
you have in B$key
?
You can obtain this by coding B$key
as factor with the unique values of A$key
as levels.
o <- table(factor(B$key, levels=unique(A$key)))
Yielding:
> o
A B C D
2 3 0 1
If you really want to count duplicates, do
dupes <- ifelse(o - 1 < 0, 0, o - 1)
Yielding:
> dupes
A B C D
1 2 0 0
Upvotes: 1
Reputation: 39858
Using tidyverse
you can do:
A %>%
left_join(B %>% #Merging df A with df B for which the count in "key" was calculated
group_by(key) %>%
tally(), by = c("key" = "key")) %>%
mutate(n = ifelse(is.na(n), 0, n)) #Replacing NA with 0
key n
1 A 2
2 B 3
3 C 0
4 D 1
Upvotes: 2
Reputation: 16121
A <- data.frame(key=c("A", "B", "C", "D"))
B <- data.frame(key=c("A", "A", "B", "B", "B", "D"))
library(dplyr)
library(tidyr)
B %>%
filter(key %in% A$key) %>% # keep values that appear in A
count(key) %>% # count values
complete(key = A$key, fill = list(n = 0)) # add any values from A that don't appear
# # A tibble: 4 x 2
# key n
# <chr> <dbl>
# 1 A 2
# 2 B 3
# 3 C 0
# 4 D 1
Upvotes: 2
Reputation: 26343
Use table
table(factor(B$key, levels = sort(unique(A$key))))
#A B C D
#2 3 0 1
factor
is needed here such that we also 'count' entries that do not appear in B$key
, that is C
.
Upvotes: 4