Maximilian
Maximilian

Reputation: 4229

Limit character length after specific word in R

I have vector of names that I would like to clean. I would like to shorten each character length:

Example:

x <- c("LambMa, a.b.c., LaMa (shorter wording), LambM, abc , a.b.c",
       "LambMa, a.b.c., LaMa (shorter wording)") 

I would like to obtain in this example only the first LambMa a.b.c. and cut off the rest. So if the specific character doesn't contain the a.b.c. twice DO NOTHING (skip).

So the specific word or expression to look for is "a.b.c". so cut the rest after first occurrence.

EDIT: I would like to obtain only the characters before a.b.c. (included) from vector x in case the there is double occurrence of a.b.c. in that given character string x.

The solution to the example above would be:

solution <- c("LambMa, a.b.c.","LambMa, a.b.c., LaMa (shorter wording)") 

EDIT 2: Also partial solution would be very helpful and would be accepted. Thanks

Upvotes: 0

Views: 349

Answers (2)

Cath
Cath

Reputation: 24074

x <- c("LambMa, a.b.c., LaMa (shorter wording), LambM, abc , a.b.c",
       "LambMa, a.b.c., LaMa (shorter wording)") 

occ_abc<-gregexpr("a.b.c",x) # find the occurences of "a.b.c."
for(i in 1:length(occ_abc)){ # for each item of x
    if(length(occ_abc[[i]])>=2) { # if there is 2 or more occurences
      x[i]<-substr(x[i],1,occ_abc[[i]][1]+5) # replace with first part of the string
    } else { # else leave the item untouched
      x[i]
    }
}

>x

[1] "LambMa, a.b.c."                         "LambMa, a.b.c., LaMa (shorter wording)"

The if...elsepart can very probably be replaced by an ifelsestatement.

Upvotes: 2

James
James

Reputation: 66834

You can use gsub to swap out if the pattern you specified matches. To avoid using a look-behind, you can capture the first a.b.c. and replace with it:

gsub("(a\\.b\\.c\\.).+(a\\.b\\.c)","\\1",x)
[1] "LambMa, a.b.c."                        
[2] "LambMa, a.b.c., LaMa (shorter wording)"

Upvotes: 2

Related Questions