Reputation: 826
I have a large data-frame in which I have to find the columns when both rows are equal for pairs of individuals.
Here is an example of the dataframe:
>data
ID pos1234 pos1345 pos1456 pos1678
1 1 C A C G
2 2 C G A G
3 3 C A G A
4 4 C G C T
I transformed the dataframe into a pairwise matrix with:
apply(data, 2, combn, m=2)
ID pos1234 pos1345 pos1456 pos1678
[1,] "1" "C" "A" "C" "G"
[2,] "2" "C" "G" "A" "G"
[3,] "1" "C" "A" "C" "G"
[4,] "3" "C" "A" "G" "A"
[5,] "1" "C" "A" "C" "G"
[6,] "4" "C" "G" "C" "T"
[7,] "2" "C" "G" "A" "G"
[8,] "3" "C" "A" "G" "A"
[9,] "2" "C" "G" "A" "G"
[10,] "4" "C" "G" "C" "T"
[11,] "3" "C" "A" "G" "A"
[12,] "4" "C" "G" "C" "T"
I am now having trouble identifying the column containing the identical letters between pairs. For example, for pairs 1
and 2
the columns containing the identical letters would be pos1234
and pos1678
.
Would it be possible get a dataframe with just identical letters for each pair of individuals?
Thanks in advance.
Upvotes: 0
Views: 63
Reputation: 66819
You can pass a function to combn
:
res0 <- combn(nrow(data), 2, FUN = function(x)
names(data[-1])[ lengths(sapply(data[x,-1], unique)) == 1 ], simplify=FALSE)
which gives
[[1]]
[1] "pos1234" "pos1678"
[[2]]
[1] "pos1234" "pos1345"
[[3]]
[1] "pos1234" "pos1456"
[[4]]
[1] "pos1234"
[[5]]
[1] "pos1234" "pos1345"
[[6]]
[1] "pos1234"
To figure out which of these [[1]]..[[6]] correspond to which pairs, take combn
again:
res <- setNames(res0, combn(data$ID, 2, paste, collapse="."))
which gives
$`1.2`
[1] "pos1234" "pos1678"
$`1.3`
[1] "pos1234" "pos1345"
$`1.4`
[1] "pos1234" "pos1456"
$`2.3`
[1] "pos1234"
$`2.4`
[1] "pos1234" "pos1345"
$`3.4`
[1] "pos1234"
Upvotes: 1