Reputation: 115
I've got two long lists A and B which have the same length but contain different numbers of equivalent elements:
List A can contain many elements which also can recur in the same field.
List B either contains only one element or an empty field, i.e. "character(0)".
A also contains some empty fields but for these records there's always an element present in B, so there are no records with empty fields in A and B.
I want to combine the elements of A and B into a new list of the same length, C, according to the following rules:
This is an example of how these lists begin:
> A
[1] "JAMES" "JAMES"
[2] "JOHN" "ROBERT"
[3] "WILLIAM" "MICHAEL" "WILLIAM" "DAVID" "WILLIAM"
[4] character(0)
...
> B
[1] "RICHARD"
[2] "JOHN"
[3] character(0)
[4] "CHARLES"
...
This is the correct output I'm looking for:
> C
[1] "JAMES" "JAMES" "RICHARD"
[2] "JOHN" "ROBERT"
[3] "WILLIAM" "MICHAEL" "WILLIAM" "DAVID" "WILLIAM"
[4] "CHARLES"
...
I tried, e.g.:
C <- sapply(mapply(union, A,B), setdiff, character(0))
But this deleted the recurrences from A, unfortunately:
> C
[1] "JAMES" "RICHARD"
[2] "JOHN" "ROBERT"
[3] "WILLIAM" "MICHAEL" "DAVID"
[4] "CHARLES"
...
Can anybody tell me, please, how to combine these two lists, preserve the recurrences from A, and achieve the output I desire?
Thank you very much in advance!
Update: Machine readable data:
A <- list(c("JAMES","JAMES"),
c("JOHN","ROBERT"),
c("WILLIAM","MICHAEL","WILLIAM","DAVID","WILLIAM"),
character(0))
B <- list("RICHARD","JOHN",character(0),"CHARLES")
Upvotes: 4
Views: 5469
Reputation: 174788
Here is your snippte of data, in reproducible form:
A <- list(c("JAMES","JAMES"),
c("JOHN","ROBERT"),
c("WILLIAM","MICHAEL","WILLIAM","DAVID","WILLIAM"),
character(0))
B <- list("RICHARD","JOHN",character(0),"CHARLES")
You were close with mapply()
. I got the desired output by using c()
to concatenate the list elements in A
and B
but had to manipulate elements of the supplied vectors, so I came up with this:
foo <- function(...) {
l1 <- length(..1)
l2 <- length(..2)
out <- character(0)
if(l1 > 0) {
if(l2 > 0) {
out <- if(..2 %in% ..1)
..1
else
c(..1, ..2)
} else {
out <- ..1
}
} else {
out <- ..2
}
out
}
We can refer to the individual elements of ...
using the ..n
placeholders; ..1
is A
and ..2
is B
. Of course, foo()
only works with two lists but doesn't enforce this or do any checking, just to keep things simple. foo()
also needs to handle the cases where either A
or B
or both are character(0)
which I now think foo()
does.
When we use that in the mapply()
call I get:
> mapply(foo, A, B)
[[1]]
[1] "JAMES" "JAMES" "RICHARD"
[[2]]
[1] "JOHN" "ROBERT"
[[3]]
[1] "WILLIAM" "MICHAEL" "WILLIAM" "DAVID" "WILLIAM"
[[4]]
[1] "CHARLES"
An lapply()
version may be more meaningful than the abstract ..n
but uses essentially the same code. Here is a new function that works with A
and B
directly but we iterate over the indices of the elements of A
(1, 2, 3, length(A)
) as generated by seq_along()
:
foo2 <- function(ind, A, B) {
l1 <- length(A[[ind]])
l2 <- length(B[[ind]])
out <- character(0)
if(l1 > 0) {
if(l2 > 0) {
out <- if(B[[ind]] %in% A[[ind]]) {
A[[ind]]
} else {
c(A[[ind]], B[[ind]])
}
} else {
out <- A[[ind]]
}
} else {
out <- B[[ind]]
}
out
}
which is called like this:
> lapply(seq_along(A), foo2, A = A, B = B)
[[1]]
[1] "JAMES" "JAMES" "RICHARD"
[[2]]
[1] "JOHN" "ROBERT"
[[3]]
[1] "WILLIAM" "MICHAEL" "WILLIAM" "DAVID" "WILLIAM"
[[4]]
[1] "CHARLES"
Upvotes: 7