Reputation: 4229
I have vector of characters something like this:
sampleData <- c("This is what I see i.r.o what not i.r.o",
"Similar here a.s. also this a.s.",
"One more i.r.o. now another i.r.o.")
I would like to remove everything after the first occurence of i.r.o
or .i.r.o
. But also in cases with a.s
or a.s.
.
So that the final version looks like this:
1 This is what I see i.r.o
2 Similar here a.s.
3 One more i.r.o.
EDIT: I corrected the gaps between the i.r.o
and a.s
with gsub()
so now the expressionas are identical in each character. See example above.
Upvotes: 0
Views: 189
Reputation: 226971
I'm a little confused because the comments above suggest that you've gotten the answer, but I don't see it.
This seems to work:
sampleData <- c("This is what I see i.r.o what not i.r.o",
"Similar here a.s. also this a.s.",
"One more i.r.o. now another i.r.o.")
gsub("(([[:alpha:]]\\.)+[[:alpha:]][.]?) .*$","\\1",sampleData)
## [1] "This is what I see i.r.o" "Similar here a.s."
## [3] "One more i.r.o."
The regex reads "'(one or more of (an alphabetic character followed by a dot), followed by another alphabetic character possibly followed by a dot), followed by a space and zero or more of any character, followed by the end of the line'; replace the stuff in quotation marks by only the stuff within the (outer set of) parentheses"
Upvotes: 2