Maximilian
Maximilian

Reputation: 4229

Shorten length of character in R

I have vector of characters something like this:

sampleData <- c("This is what I see i.r.o what not i.r.o",
                "Similar here a.s. also this a.s.",
                "One more i.r.o. now another i.r.o.") 

I would like to remove everything after the first occurence of i.r.o or .i.r.o. But also in cases with a.s or a.s..

So that the final version looks like this:

1 This is what I see i.r.o 
2 Similar here a.s. 
3 One more i.r.o. 

EDIT: I corrected the gaps between the i.r.o and a.s with gsub() so now the expressionas are identical in each character. See example above.

Upvotes: 0

Views: 189

Answers (1)

Ben Bolker
Ben Bolker

Reputation: 226971

I'm a little confused because the comments above suggest that you've gotten the answer, but I don't see it.

This seems to work:

sampleData <- c("This is what I see i.r.o what not i.r.o",
                "Similar here a.s. also this a.s.",
                "One more i.r.o. now another i.r.o.")
gsub("(([[:alpha:]]\\.)+[[:alpha:]][.]?) .*$","\\1",sampleData)
## [1] "This is what I see i.r.o" "Similar here a.s."       
## [3] "One more i.r.o."         

The regex reads "'(one or more of (an alphabetic character followed by a dot), followed by another alphabetic character possibly followed by a dot), followed by a space and zero or more of any character, followed by the end of the line'; replace the stuff in quotation marks by only the stuff within the (outer set of) parentheses"

Upvotes: 2

Related Questions