Reputation: 49
I have a character vector made up of filenames like:
vector <- c("LR1_0001_a", "LR1_0002_b", "LR02_0001_b", "LR02_0002_x", "LR3_001_c")
My goal is to subset this vector based on pattern matching the first x number of characters (dynamically), up to the first "_". The outputs would look something like this:
solution1 <- c("LR1_0001_a", "LR1_0002_b")
solution2 <- c("LR02_0001_b", "LR02_0002_b")
solution3 <- c("LR3_001_c")
I have experimented with mixtures of unique
and grep
but have not had any luck so far
Upvotes: 0
Views: 432
Reputation: 887213
We can use trimws
out <- split(vector, trimws(vector, whitespace = "_[a-z]+"))
and then use list2env
list2env(out, .GlobalEnv)
Upvotes: 0
Reputation: 5788
Base R solution (coerce vector to data.frame):
# Split vector into list (as in ronak's answer):
vect_list <- split(vect, sub("_.*", "", vect))
# Pad each vector in the list to be the same length as the longest vector:
padded_vect_list <- lapply(vect_list,
function(x){length(x) = max(lengths(vect_list)); return(x)})
# Coerce the list of vectors into a dataframe:
df <- data.frame(do.call("cbind", padded_vect_list))
Data:
vect <- c("LR1_0001_a", "LR1_0002_b", "LR02_0001_b", "LR02_0002_x", "LR3_001_c")
Upvotes: 0
Reputation: 389047
We can use sub
to remove everything after underscore "_"
and split
the vector.
output <- split(vector, sub('_.*', '', vector))
output
#$LR02
#[1] "LR02_0001_b" "LR02_0002_x"
#$LR1
#[1] "LR1_0001_a" "LR1_0002_b"
#$LR3
#[1] "LR3_001_c"
This returns a list of vectors, which is usually a better way to manage data instead of creating number of objects in global environment. However, if you want them as separate vectors we can use list2env
.
list2env(output, .GlobalEnv)
This will create vectors with the name LR02
, LR1
and LR3
respectively.
Upvotes: 3