Reputation: 3
I want to find the number of all common elements in rows of a dataframe
name members
x1 A,B,N,K,Y,G
x2 J,L,M,N,T
x3 G,H,S,J,D,F
x4 J,K,H,F,H,D,L
name common name
x1 6 x1
x1 2 x2
x1 - x3
x1 - x4
x2 - x1
x2 5 - x2
x2 - x3
x2 - x4
x3 - x1
x3 - x2
x3 6 - x3
x3 - x4
x4 - x1
x4 - x2
x4 - x3
x4 7 -x4
Upvotes: 0
Views: 699
Reputation: 269501
1) dplyr/tidyr For each row, create a separate row for each member using separate_rows
and join that to itself by members
. Then compute counts and complete it.
library(dplyr)
library(tidyr)
DF %>%
separate_rows(members) %>%
distinct %>%
inner_join(., ., by = "members") %>%
count(name.x, name.y) %>%
complete(name.x, name.y)
giving:
# A tibble: 16 x 3
name.x name.y n
<chr> <chr> <int>
1 x1 x1 6
2 x1 x2 1
3 x1 x3 1
4 x1 x4 1
5 x2 x1 1
6 x2 x2 5
7 x2 x3 1
8 x2 x4 2
9 x3 x1 1
10 x3 x2 1
11 x3 x3 6
12 x3 x4 4
13 x4 x1 1
14 x4 x2 2
15 x4 x3 4
16 x4 x4 6
2) Base R Create a function which computes the number of intersections between two member components. Then use outer
to apply it to each pair and convert to a data.frame.
Scan <- function(x) scan(text = x, what = "", sep = ",", quiet = TRUE)
countSame <- function(x, y) length(intersect(Scan(x), Scan(y)))
x <- setNames(DF$members, DF$name)
as.data.frame.table(outer(x, x, Vectorize(countSame)))
giving:
Var1 Var2 Freq
1 x1 x1 6
2 x2 x1 1
3 x3 x1 1
4 x4 x1 1
5 x1 x2 1
6 x2 x2 5
7 x3 x2 1
8 x4 x2 2
9 x1 x3 1
10 x2 x3 1
11 x3 x3 6
12 x4 x3 4
13 x1 x4 1
14 x2 x4 2
15 x3 x4 4
16 x4 x4 6
Although the question asked for the data.frame form above you might prefer a 2d table which can be produced by just omitting as.data.frame.table
from the last line of code.
x1 x2 x3 x4
x1 6 1 1 1
x2 1 5 1 2
x3 1 1 6 4
x4 1 2 4 6
2a) A variation on (2) which is only two lines long can be formed by applying strsplit to the members and then computing the length of the intersection of pairs using outer. Finally we convert to a data frame. A 2d table could again be formed by omitting as.data.frame.table
.)
x <- with(DF, setNames(strsplit(members, ","), name))
as.data.frame.table(outer(x, x, Vectorize(function(x, y) length(intersect(x, y)))))
giving:
Var1 Var2 Freq
1 x1 x1 6
2 x2 x1 1
3 x3 x1 1
4 x4 x1 1
5 x1 x2 1
6 x2 x2 5
7 x3 x2 1
8 x4 x2 2
9 x1 x3 1
10 x2 x3 1
11 x3 x3 6
12 x4 x3 4
13 x1 x4 1
14 x2 x4 2
15 x3 x4 4
16 x4 x4 6
Lines <- "name members
x1 A,B,N,K,Y,G
x2 J,L,M,N,T
x3 G,H,S,J,D,F
x4 J,K,H,F,H,D,L"
DF <- read.table(text = Lines, header = TRUE, as.is = TRUE)
In (2a) with R 4.3 as.data.frame.table
could be replaced with array2DF
.
Upvotes: 2
Reputation: 76402
I believe the following code does what the question asks for. But note that I find it complicated, with two merge
instructions, maybe someone else will find a simpler solution.
fun <- function(DF){
ex <- expand.grid(Var2 = DF[['name']], name = DF[['name']])[2:1]
members <- as.character(DF[['members']])
merge(DF, ex)
}
tmp <- merge(df1, fun(df1))
o <- order(tmp[[3]])
tmp$members2 <- tmp$members[o]
tmp$common <- apply(tmp[c(2, 4)], 1, function(x){
y1 <- unlist(strsplit(as.character(x[1]), ","))
y2 <- unlist(strsplit(as.character(x[2]), ","))
length(intersect(y1, y2))
})
res <- tmp[c(1, 5, 3)]
names(res)[3] <- "name2"
head(res)
# name common name2
#1 x1 6 x1
#2 x1 1 x2
#3 x1 1 x3
#4 x1 1 x4
#5 x2 1 x1
#6 x2 5 x2
Final clean up.
rm(tmp)
Data.
df1 <- read.table(text = "
name members
x1 A,B,N,K,Y,G
x2 J,L,M,N,T
x3 G,H,S,J,D,F
x4 J,K,H,F,H,D,L
", header = TRUE)
Upvotes: 0