Reputation: 3689
I have a bunch of strings of mixed length, but all with a year embedded. I am trying to extract just the text part, that is everything until the number start and am having problem with lookeahead assertions assuming that is the proper way of such extractions.
Here is what I have (returns no match):
>grep("\\b.(?=\\d{4})","foo_1234_bar",perl=T,value=T)
In the example I am looking to extract just foo
but there may be several, and of mixed lengths, separated by _
before the year portion.
Upvotes: 1
Views: 593
Reputation: 109984
Another approach (often I find that strsplit is faster than regex searching but not always (though this does use a slight bit of regexing):
x <- c("asdfas_1987asdf", "asd_das_12") #shamelessly stealing Dason's example
sapply(strsplit(x, "[0-9]+"), "[[", 1)
Upvotes: 2
Reputation: 7361
Look-aheads may be overkill here. Use the underscore and the 4 digits as the structure, combined with a non-greedy quantifier to prevent the 'dot' from gobbling up everything:
/(.+?)_\d{4}/
-first matching group ($1) holds 'foo'
Upvotes: 5
Reputation: 61953
This will grab everything up until the first digit
x <- c("asdfas_1987asdf", "asd_das_12")
regmatches(x, regexpr("^[^[:digit:]]*", x))
#[1] "asdfas_" "asd_das_"
Upvotes: 4