Morts81
Morts81

Reputation: 439

Count rows containing combinations in a dataframe in R

Assuming I have the following dataframe which is made up of characters 'a' through 'e':

df <- data.frame(P1 = c("a","b","c"), P2 = c("b","c","d"), P3 = c("e","a","e"))

  P1 P2 P3
1  a  b  e
2  b  c  a
3  c  d  e

I want to count the number of dataframe rows that contain each combination (size 2) of 'a' through 'e'. So in this case there are 10 different (size 2) combinations of 'a' through 'e'.

t(combn(c("a","b","c","d","e"),2))

    [,1] [,2]
 [1,] "a"  "b" 
 [2,] "a"  "c" 
 [3,] "a"  "d" 
 [4,] "a"  "e" 
 [5,] "b"  "c" 
 [6,] "b"  "d" 
 [7,] "b"  "e" 
 [8,] "c"  "d" 
 [9,] "c"  "e" 
[10,] "d"  "e" 

2 rows have the combination "a" & "b", 1 row has the combination "a" & "c", no rows have the combination "a" & "d" and so on ......... ultimately I want a dataframe that looks like this:

   X1 X2 Count
1   a  b     2
2   a  c     1
3   a  d     0
4   a  e     ..
......... etc

Any assistance on this would be greatly appreciated, I was thinking dplyr or similar might have something suitable but haven't managed to work out an easy approach without a time consuming loop or similar.

Upvotes: 0

Views: 217

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 389265

You can get the data in long format and create the combinations for each row, count the combinations and join it with data which we created for all combinations to get missing values.

library(dplyr) #dplyr >= 1.0.0
library(tidyr)

ref_data <- as.data.frame(t(combn(unique(unlist(df)),2)))

df %>%
  mutate(row = row_number()) %>%
  pivot_longer(cols = -row) %>%
  group_by(row) %>%
  summarise(value = combn(sort(value), 2, toString)) %>%
  separate(value, c('V1', 'V2'), sep = ", ") %>%
  ungroup %>%
  count(V1, V2) %>%
  right_join(ref_data, by = c('V1', 'V2')) %>%
  mutate(n = replace_na(n, 0))


# A tibble: 10 x 3
#   V1    V2        n
#   <chr> <chr> <dbl>
# 1 a     b         2
# 2 a     c         1
# 3 a     e         1
# 4 b     c         1
# 5 b     e         1
# 6 c     d         1
# 7 c     e         1
# 8 d     e         1
# 9 a     d         0
#10 b     d         0

Upvotes: 2

Related Questions