Shawn
Shawn

Reputation: 172

Remove and set aside elements of vectors within a list that don't exist in another vector

I'm trying to form arguments for use in the reshape() function. I have a vector of column names, some of which should be merged by reshape() because they share the same letter at the end:

> v <- c("x","da","db","ea","eb","ec","fb")

Most of these columns are comprised of a combination of pre and post characters. pre will be the timevar argument and post will be the v.names argument in reshape(). They are defined as:

> pre <- c("d","e","f")
> post <- c("a","b","c")

I have organized the problem this way since there are a variable number of columns I will have to perform this on for different files. By parsing the column names like this, I'm sure I can do this with an algorithm rather than a manual hack.

My desired output is a list of vectors that only include elements of v that share the same post letter. The intention is to use these as the varying parameter in reshape():

> desired_lov
$a
[1] "da" "ea"

$b
[1] "db" "eb" "fb"

And in addition, I would like to keep track of which elements are missing from desired_lov which still exist in the original v vector. The intention is to use these as the idvar parameter in reshape():

> desired_idh
[1] "x" "ec"

With all that given, someone helped me to build a list of vectors with possible column names with those prefixes and postfixes. Each vector in this list is named after an element in post, and I believe this is important in order for this to work with reshape() since it will merge those columns in each vector under a common name:

> lov <- Map(function(x) paste0(pre,x),post)
> lov
$a
[1] "da" "ea" "fa"

$b
[1] "db" "eb" "fb"

$c
[1] "dc" "ec" "fc"

Except this builds more names from those combinations than actually exist in v. So I would like to keep track of which names in v do not exist in lov, for which I've tried:

> idh <- NULL
> Map(function(x) idh <- paste(idh,lov[[x]][lov[[x]] %in% v]),1:length(lov))
[[1]]
[1] " da" " ea"

[[2]]
[1] " db" " eb" " fb"

[[3]]
[1] " ec"

> idh
NULL

Except apparently I'm not succeeding in modifying the idh variable using Map()

For the next step (after I figure out the bit immediately above), in order to strip out the elements of lov that don't match v, I've tried:

>  Map(function(x) lov[[x]] <- lov[[x]][lov[[x]] %in% v],1:length(lov))
[[1]]
[1] "da" "ea"

[[2]]
[1] "db" "eb" "fb"

[[3]]
[1] "ec"

> lov
$a
[1] "da" "ea" "fa"

$b
[1] "db" "eb" "fb"

$c
[1] "dc" "ec" "fc"

Which gives me promising output (I would need to remove all vectors from that list that have length < 2 since I'm only looking for duplicated columns based on their second characters), but once again it failed to actually modify lov by removing the elements I was trying to remove.

I've tried searching, but all I keep finding are ways to remove elements of vectors. This seems to be a much different problem since I'm trying to remove elements from multiple vectors embedded in a list while trying to preserve the vector names in that list.

Edit: I do know about x ahead of time, so I can manually exclude it where needed. But I don't know that c is a unique postfix ahead of time (in this particular example), so it needs to be determined within the script.

Upvotes: 0

Views: 88

Answers (1)

Pierre L
Pierre L

Reputation: 28441

freq <- lapply(Map(function(x) grep(x, v), post), length)
index <- Map(function(x) grep(x, v), names(freq)[freq>1])
lapply(index, function(x) v[x])
$a
[1] "da" "ea"

$b
[1] "db" "eb" "fb"

and

v[-unlist(index)]
[1] "x"  "ec"

Data

v <- c("x","da","db","ea","eb","ec","fb")
pre <- c("d","e","f")
post <- c("a","b","c")

Upvotes: 1

Related Questions