Reputation: 1134
Here is the dataset
structure(list(V1 = structure(c(2L, 3L, 4L, 5L, 3L, 4L, 5L, 4L,
5L), .Label = c("ABC", "DEF", "GHI", "JKL", "MNO"), class = "factor"),
V2 = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L), .Label = c("ABC",
"DEF", "GHI", "JKL", "MNO"), class = "factor"), Freq = c(3L,
2L, 3L, 2L, 3L, 1L, 2L, 1L, 1L)), row.names = c(NA, 9L), class = "data.frame")
which looks like this
V1 V2 Freq
1 DEF ABC 3
2 GHI ABC 2
3 JKL ABC 3
4 MNO ABC 2
5 GHI DEF 3
6 JKL DEF 1
7 MNO DEF 2
8 JKL GHI 1
9 MNO GHI 1
I would like to create something like this
x y
ABC 4
DEF 4
GHI 4
JKL 3
MNO 3
Note that ABC has an association 4 times with DEF, GHI, JKL, and MNO. This corresponds to line 1,2,3 and 4. JKL has an association 3 times with ABC,DEF, and GHI. This corresponds to line 3,6,and 8.
Upvotes: 3
Views: 74
Reputation: 26218
library janitor
also helps
janitor::tabyl(c(df$V1, df$V2))
#> c(df$V1, df$V2) n percent
#> ABC 4 0.2222222
#> DEF 4 0.2222222
#> GHI 4 0.2222222
#> JKL 3 0.1666667
#> MNO 3 0.1666667
Created on 2021-07-03 by the reprex package (v2.0.0)
Upvotes: 1
Reputation: 101568
We can try table
+ stack
> rev(stack(table(unlist(df[-3]))))
ind values
1 ABC 4
2 DEF 4
3 GHI 4
4 JKL 3
5 MNO 3
or an igraph
option with degree
like below
> rev(stack(degree(graph_from_data_frame(df))))
ind values
1 DEF 4
2 GHI 4
3 JKL 3
4 MNO 3
5 ABC 4
Upvotes: 2
Reputation: 42544
If I understand correctly, the OP wants to count the occurrences of each item in both columns V1
and V2
.
This can be solved by reshaping and counting the occurrences in long format:
library(data.table)
melt(setDT(df), measure.vars = patterns("^V"))[, .N, keyby = value]
value N 1: ABC 4 2: DEF 4 3: GHI 4 4: JKL 3 5: MNO 3
or with dplyr
& tidyr
:
library(dplyr)
library(tidyr)
df %>%
pivot_longer(starts_with("V")) %>%
count(value)
# A tibble: 5 x 2 value n <fct> <int> 1 ABC 4 2 DEF 4 3 GHI 4 4 JKL 3 5 MNO 3
Upvotes: 1
Reputation: 21918
This solution can also be used:
library(dplyr)
library(purrr)
library(tibble)
df %>%
select(V1, V2) %>%
unlist() %>%
unique() %>%
enframe() %>%
mutate(freq = map_int(value, function(a) pmap_lgl(df[-3], ~ a %in% c(...)) %>% reduce(`+`))) %>%
arrange(value) %>%
select(-name)
# A tibble: 5 x 2
value freq
<fct> <int>
1 ABC 4
2 DEF 4
3 GHI 4
4 JKL 3
5 MNO 3
Upvotes: 3
Reputation: 887168
We can do this in a couple of ways
split
, each column by the other into a list
. Get the lengths
, add (+
) and stack
it to 2 column data.framestack(lengths(split(as.character(df1$V2), df1$V1)) +
lengths(split(as.character(df1$V1), df1$V2)))[2:1]
-output
ind values
1 ABC 4
2 DEF 4
3 GHI 4
4 JKL 3
5 MNO 3
table
on the columns, +
them, get the rowSums
, and stack
itstack( rowSums(table(df1[2:1]) + table(df1[1:2])))[2:1]
Upvotes: 2