Nikola
Nikola

Reputation: 630

Counting frequencies with dplyr

I'm having a data frame with two columns id_1 and id_2. For each of id_1, I want to count the number of matches it has with all the elements of id_2.

I imagine the result being a data frame with columns id_1, id_2 and number_of_id_2_found_for_id_1.

Here's what I'm trying

set.seed(1)
df <- data.frame(
  id_1 = sample(1:10, size = 30, replace = TRUE),
  id_2 = sample(1:10, size = 30, replace = TRUE)
)

df %>% group_by(id_1, id_2) %>%
  # id_1 should be unique
  summarise(~n(.x)) # I want this to be the number of id_2 it has found for each of the elements of id_1

My expected output would be:

1 1 0
1 2 0
1 3 0
1 4 1
1 5 0
....
1 9 0
2 1 0
...
2 7 1
2 8 0
2 9 1

And so on, basically for each id_1 the number of elements it has found for each_id_2. In the example above it's mostly 1, but in a lot bigger data frame the count would increase. This is like a bipartite graph where the edge would be the number of left-to-right matches between the first component - id_1 and id_2.

Thanks in advance!

Upvotes: 1

Views: 59

Answers (1)

akrun
akrun

Reputation: 886938

Based on the updated post, may be we need to do a crossing to return all the combinations, do a count on the original dataset for both columns and join with the full combination

library(dplyr)
library(tidyr)
crossing(id_1 = 1:10, id_2 = 1:10)  %>% 
  left_join(., df %>% 
                  count(id_1, id_2)) %>%
  mutate(n = replace_na(n, 0))

-output

# A tibble: 100 x 3
#    id_1  id_2     n
#   <int> <int> <dbl>
# 1     1     1     0
# 2     1     2     0
# 3     1     3     1
# 4     1     4     1
# 5     1     5     0
# 6     1     6     0
# 7     1     7     0
# 8     1     8     0
# 9     1     9     1
#10     1    10     0
# … with 90 more rows

Upvotes: 1

Related Questions