Reputation: 4229
I have vector of names that I would like to clean. I would like to shorten each character length:
Example:
x <- c("LambMa, a.b.c., LaMa (shorter wording), LambM, abc , a.b.c",
"LambMa, a.b.c., LaMa (shorter wording)")
I would like to obtain in this example only the first LambMa a.b.c.
and cut off the rest. So if the specific character doesn't contain the a.b.c.
twice DO NOTHING (skip).
So the specific word or expression to look for is "a.b.c"
. so cut the rest after first occurrence.
EDIT: I would like to obtain only the characters before a.b.c.
(included) from vector x
in case the there is double occurrence of a.b.c.
in that given character string x
.
The solution to the example above would be:
solution <- c("LambMa, a.b.c.","LambMa, a.b.c., LaMa (shorter wording)")
EDIT 2: Also partial solution would be very helpful and would be accepted. Thanks
Upvotes: 0
Views: 349
Reputation: 24074
x <- c("LambMa, a.b.c., LaMa (shorter wording), LambM, abc , a.b.c",
"LambMa, a.b.c., LaMa (shorter wording)")
occ_abc<-gregexpr("a.b.c",x) # find the occurences of "a.b.c."
for(i in 1:length(occ_abc)){ # for each item of x
if(length(occ_abc[[i]])>=2) { # if there is 2 or more occurences
x[i]<-substr(x[i],1,occ_abc[[i]][1]+5) # replace with first part of the string
} else { # else leave the item untouched
x[i]
}
}
>x
[1] "LambMa, a.b.c." "LambMa, a.b.c., LaMa (shorter wording)"
The if...else
part can very probably be replaced by an ifelse
statement.
Upvotes: 2
Reputation: 66834
You can use gsub
to swap out if the pattern you specified matches. To avoid using a look-behind, you can capture the first a.b.c.
and replace with it:
gsub("(a\\.b\\.c\\.).+(a\\.b\\.c)","\\1",x)
[1] "LambMa, a.b.c."
[2] "LambMa, a.b.c., LaMa (shorter wording)"
Upvotes: 2