Reputation: 1134

From a network pair frequency dataset, how to assign a corresponding count for each group and their association with another group?

Here is the dataset

structure(list(V1 = structure(c(2L, 3L, 4L, 5L, 3L, 4L, 5L, 4L, 
5L), .Label = c("ABC", "DEF", "GHI", "JKL", "MNO"), class = "factor"), 
V2 = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L), .Label = c("ABC", 
"DEF", "GHI", "JKL", "MNO"), class = "factor"), Freq = c(3L,                                                                                                                                     
2L, 3L, 2L, 3L, 1L, 2L, 1L, 1L)), row.names = c(NA, 9L), class = "data.frame")

which looks like this

   V1  V2 Freq
1  DEF ABC    3
2  GHI ABC    2
3  JKL ABC    3
4  MNO ABC    2
5  GHI DEF    3
6  JKL DEF    1
7  MNO DEF    2
8  JKL GHI    1
9  MNO GHI    1

I would like to create something like this

x       y
ABC     4
DEF     4
GHI     4
JKL     3
MNO     3

Note that ABC has an association 4 times with DEF, GHI, JKL, and MNO. This corresponds to line 1,2,3 and 4. JKL has an association 3 times with ABC,DEF, and GHI. This corresponds to line 3,6,and 8.

Upvotes: 3

Answers (5)

AnilGoyal

Reputation: 26218

library janitor also helps

janitor::tabyl(c(df$V1, df$V2))
#>  c(df$V1, df$V2) n   percent
#>              ABC 4 0.2222222
#>              DEF 4 0.2222222
#>              GHI 4 0.2222222
#>              JKL 3 0.1666667
#>              MNO 3 0.1666667

^{Created on 2021-07-03 by the reprex package (v2.0.0)}

Upvotes: 1

ThomasIsCoding

Reputation: 101568

We can try table + stack

> rev(stack(table(unlist(df[-3]))))
  ind values
1 ABC      4
2 DEF      4
3 GHI      4
4 JKL      3
5 MNO      3

or an igraph option with degree like below

> rev(stack(degree(graph_from_data_frame(df))))
  ind values
1 DEF      4
2 GHI      4
3 JKL      3
4 MNO      3
5 ABC      4

Upvotes: 2

Uwe

Reputation: 42544

If I understand correctly, the OP wants to count the occurrences of each item in both columns V1 and V2.

This can be solved by reshaping and counting the occurrences in long format:

library(data.table)
melt(setDT(df), measure.vars = patterns("^V"))[, .N, keyby = value]

   value N
1:   ABC 4
2:   DEF 4
3:   GHI 4
4:   JKL 3
5:   MNO 3

or with dplyr & tidyr:

library(dplyr)
library(tidyr)
df %>% 
  pivot_longer(starts_with("V")) %>% 
  count(value)

# A tibble: 5 x 2
  value     n
  <fct> <int>
1 ABC       4
2 DEF       4
3 GHI       4
4 JKL       3
5 MNO       3

Upvotes: 1

Anoushiravan R

Reputation: 21918

This solution can also be used:

library(dplyr)
library(purrr)
library(tibble)

df %>%
  select(V1, V2) %>%
  unlist() %>% 
  unique() %>% 
  enframe() %>%
  mutate(freq = map_int(value, function(a) pmap_lgl(df[-3], ~ a %in% c(...)) %>% reduce(`+`))) %>%
  arrange(value) %>%
  select(-name)

# A tibble: 5 x 2
  value  freq
  <fct> <int>
1 ABC       4
2 DEF       4
3 GHI       4
4 JKL       3
5 MNO       3

Upvotes: 3

akrun

Reputation: 887168

We can do this in a couple of ways

We split, each column by the other into a list. Get the lengths, add (+) and stack it to 2 column data.frame

stack(lengths(split(as.character(df1$V2), df1$V1)) + 
       lengths(split(as.character(df1$V1), df1$V2)))[2:1]

-output

 ind values
1 ABC      4
2 DEF      4
3 GHI      4
4 JKL      3
5 MNO      3

Or do the table on the columns, + them, get the rowSums, and stack it

stack( rowSums(table(df1[2:1]) + table(df1[1:2])))[2:1]

Upvotes: 2

From a network pair frequency dataset, how to assign a corresponding count for each group and their association with another group?

Answers (5)

Related Questions