Reputation: 630
I'm having a data frame with two columns id_1 and id_2. For each of id_1, I want to count the number of matches it has with all the elements of id_2.
I imagine the result being a data frame with columns id_1, id_2 and number_of_id_2_found_for_id_1.
Here's what I'm trying
set.seed(1)
df <- data.frame(
id_1 = sample(1:10, size = 30, replace = TRUE),
id_2 = sample(1:10, size = 30, replace = TRUE)
)
df %>% group_by(id_1, id_2) %>%
# id_1 should be unique
summarise(~n(.x)) # I want this to be the number of id_2 it has found for each of the elements of id_1
My expected output would be:
1 1 0
1 2 0
1 3 0
1 4 1
1 5 0
....
1 9 0
2 1 0
...
2 7 1
2 8 0
2 9 1
And so on, basically for each id_1 the number of elements it has found for each_id_2. In the example above it's mostly 1, but in a lot bigger data frame the count would increase. This is like a bipartite graph where the edge would be the number of left-to-right matches between the first component - id_1 and id_2.
Thanks in advance!
Upvotes: 1
Views: 59
Reputation: 886938
Based on the updated post, may be we need to do a crossing
to return all the combinations, do a count
on the original dataset for both columns and join with the full combination
library(dplyr)
library(tidyr)
crossing(id_1 = 1:10, id_2 = 1:10) %>%
left_join(., df %>%
count(id_1, id_2)) %>%
mutate(n = replace_na(n, 0))
-output
# A tibble: 100 x 3
# id_1 id_2 n
# <int> <int> <dbl>
# 1 1 1 0
# 2 1 2 0
# 3 1 3 1
# 4 1 4 1
# 5 1 5 0
# 6 1 6 0
# 7 1 7 0
# 8 1 8 0
# 9 1 9 1
#10 1 10 0
# … with 90 more rows
Upvotes: 1