Reputation: 73
I have a vector composed of entries such as "ZZZ1Z01Z0ZZ0", "1001ZZ0Z00Z0", and so on, and I want to subset this vector based on conditions such as:
I tried playing around with strsplit and grep, but I couldn't figure out a way to restrict my conditions based on the position of the character on the string. Any suggestions?
Many thanks!
Upvotes: 7
Views: 12472
Reputation: 121077
Expanding Josh's answer, you want
your_dataset <- data.frame(
z = c("ZZZ1Z01Z0ZZ0", "1001ZZ0Z00Z0")
)
regexes <- c("^..Z", "^..Z...Z", "^[^Z]{2}Z[^Z]{3}Z[^Z]+")
lapply(regexes, function(rx)
{
subset(your_dataset, grepl(rx, z))
})
Also consider replacing grepl(rx, z)
with str_detect(z, rx)
, using the stringr
package. (There's no real difference except for slightly more readable code.)
Upvotes: 4
Reputation: 61933
You can do the first two without regular expressions using the substr command to pull out specific characters if you want.
# Grab the third character in each element and compare it to Z
substr(z, 3, 3) == "Z"
# Check if the 3rd and 7th characters are both Z
(substr(z, 3, 3) == "Z") & (substr(z, 7, 7) == "Z")
However, the regular expression approach Joshua gave is more flexible and trying to implement the third restriction you had using a substr approach would be a pain. Regular expressions are much more well suited for a problem like your third restriction and learning how to use them is never a bad idea.
Upvotes: 2
Reputation: 176648
You can do this with regular expressions (see ?regexp
for details on regular expressions).
grep
returns the location of the match and returns a zero-length vector if no match is found. You may want to use grepl
instead, since it returns a logical vector you can use to subset.
z <- c("ZZZ1Z01Z0ZZ0", "1001ZZ0Z00Z0")
# 3rd character is Z ("^" is start of string, "." is any character)
grep("^..Z", z)
# 3rd and 7th characters are Z
grep("^..Z...Z", z)
# 3rd and 7th characters are Z, no other characters are Z
# "[]" defines a "character class" and "^" in a character class negates the match
# "{n}" repeats the preceding match n times, "+" repeats is one or more times
grep("^[^Z]{2}Z[^Z]{3}Z[^Z]+", z)
Upvotes: 12