D. Studer
D. Studer

Reputation: 1875

Count number of duplicates in other dataframe

I have two data.frames dfA and dfB. Both of them have a column called key. Now I'd like to know how many duplicates for A$key there are in B$key.

A <- data.frame(key=c("A", "B", "C", "D"))
B <- data.frame(key=c("A", "A", "B", "B", "B", "D"))

It should be A=2, B=3, C=0 and D=1. Whats the most easiest way to do this?

Upvotes: 1

Views: 58

Answers (4)

jay.sf
jay.sf

Reputation: 72593

Actually you mean how many occurrences of each value of A$key you have in B$key?

You can obtain this by coding B$key as factor with the unique values of A$key as levels.

o <- table(factor(B$key, levels=unique(A$key)))

Yielding:

> o

A B C D 
2 3 0 1 

If you really want to count duplicates, do

dupes <- ifelse(o - 1 < 0, 0, o - 1)

Yielding:

> dupes

A B C D 
1 2 0 0 

Upvotes: 1

tmfmnk
tmfmnk

Reputation: 39858

Using tidyverse you can do:

A %>%
 left_join(B %>% #Merging df A with df B for which the count in "key" was calculated
            group_by(key) %>%
            tally(), by = c("key" = "key")) %>%
 mutate(n = ifelse(is.na(n), 0, n)) #Replacing NA with 0

  key n
1   A 2
2   B 3
3   C 0
4   D 1

Upvotes: 2

AntoniosK
AntoniosK

Reputation: 16121

A <- data.frame(key=c("A", "B", "C", "D"))
B <- data.frame(key=c("A", "A", "B", "B", "B", "D"))

library(dplyr)
library(tidyr)

B %>%
  filter(key %in% A$key) %>%                 # keep values that appear in A
  count(key) %>%                             # count values
  complete(key = A$key, fill = list(n = 0))  # add any values from A that don't appear

# # A tibble: 4 x 2
#   key       n
#   <chr> <dbl>
# 1 A         2
# 2 B         3
# 3 C         0
# 4 D         1

Upvotes: 2

markus
markus

Reputation: 26343

Use table

table(factor(B$key, levels = sort(unique(A$key))))
#A B C D 
#2 3 0 1

factor is needed here such that we also 'count' entries that do not appear in B$key, that is C.

Upvotes: 4

Related Questions