user35131
user35131

Reputation: 1134

From a network pair frequency dataset, how to assign a corresponding count for each group and their association with another group?

Here is the dataset

structure(list(V1 = structure(c(2L, 3L, 4L, 5L, 3L, 4L, 5L, 4L, 
5L), .Label = c("ABC", "DEF", "GHI", "JKL", "MNO"), class = "factor"), 
V2 = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L), .Label = c("ABC", 
"DEF", "GHI", "JKL", "MNO"), class = "factor"), Freq = c(3L,                                                                                                                                     
2L, 3L, 2L, 3L, 1L, 2L, 1L, 1L)), row.names = c(NA, 9L), class = "data.frame")
                                                                            

which looks like this

   V1  V2 Freq
1  DEF ABC    3
2  GHI ABC    2
3  JKL ABC    3
4  MNO ABC    2
5  GHI DEF    3
6  JKL DEF    1
7  MNO DEF    2
8  JKL GHI    1
9  MNO GHI    1

I would like to create something like this

x       y
ABC     4
DEF     4
GHI     4
JKL     3
MNO     3

Note that ABC has an association 4 times with DEF, GHI, JKL, and MNO. This corresponds to line 1,2,3 and 4. JKL has an association 3 times with ABC,DEF, and GHI. This corresponds to line 3,6,and 8.

Upvotes: 3

Views: 74

Answers (5)

AnilGoyal
AnilGoyal

Reputation: 26218

library janitor also helps

janitor::tabyl(c(df$V1, df$V2))
#>  c(df$V1, df$V2) n   percent
#>              ABC 4 0.2222222
#>              DEF 4 0.2222222
#>              GHI 4 0.2222222
#>              JKL 3 0.1666667
#>              MNO 3 0.1666667

Created on 2021-07-03 by the reprex package (v2.0.0)

Upvotes: 1

ThomasIsCoding
ThomasIsCoding

Reputation: 101568

We can try table + stack

> rev(stack(table(unlist(df[-3]))))
  ind values
1 ABC      4
2 DEF      4
3 GHI      4
4 JKL      3
5 MNO      3

or an igraph option with degree like below

> rev(stack(degree(graph_from_data_frame(df))))
  ind values
1 DEF      4
2 GHI      4
3 JKL      3
4 MNO      3
5 ABC      4

Upvotes: 2

Uwe
Uwe

Reputation: 42544

If I understand correctly, the OP wants to count the occurrences of each item in both columns V1 and V2.

This can be solved by reshaping and counting the occurrences in long format:

library(data.table)
melt(setDT(df), measure.vars = patterns("^V"))[, .N, keyby = value]
   value N
1:   ABC 4
2:   DEF 4
3:   GHI 4
4:   JKL 3
5:   MNO 3

or with dplyr & tidyr:

library(dplyr)
library(tidyr)
df %>% 
  pivot_longer(starts_with("V")) %>% 
  count(value)
# A tibble: 5 x 2
  value     n
  <fct> <int>
1 ABC       4
2 DEF       4
3 GHI       4
4 JKL       3
5 MNO       3

Upvotes: 1

Anoushiravan R
Anoushiravan R

Reputation: 21918

This solution can also be used:

library(dplyr)
library(purrr)
library(tibble)

df %>%
  select(V1, V2) %>%
  unlist() %>% 
  unique() %>% 
  enframe() %>%
  mutate(freq = map_int(value, function(a) pmap_lgl(df[-3], ~ a %in% c(...)) %>% reduce(`+`))) %>%
  arrange(value) %>%
  select(-name)

# A tibble: 5 x 2
  value  freq
  <fct> <int>
1 ABC       4
2 DEF       4
3 GHI       4
4 JKL       3
5 MNO       3

Upvotes: 3

akrun
akrun

Reputation: 887168

We can do this in a couple of ways

  1. We split, each column by the other into a list. Get the lengths, add (+) and stack it to 2 column data.frame
stack(lengths(split(as.character(df1$V2), df1$V1)) + 
       lengths(split(as.character(df1$V1), df1$V2)))[2:1]

-output

 ind values
1 ABC      4
2 DEF      4
3 GHI      4
4 JKL      3
5 MNO      3

  1. Or do the table on the columns, + them, get the rowSums, and stack it
stack( rowSums(table(df1[2:1]) + table(df1[1:2])))[2:1]

Upvotes: 2

Related Questions