Reputation: 1389
I have a vector of strings and want to add a + before each word in each string.
strings <- c('string one', 'string two', 'string three')
strings_new <- str_replace_all(strings, "\\b\\w", '+')
string_new
Unfortunately, this is replacing the first character, not adding the + symbol. I'm not too familiar with regex to know how to solve this.
Any help would be great.
Thanks
Upvotes: 2
Views: 1151
Reputation: 627100
You may use a base R solution using PCRE regex [[:<:]]
that matches the starting word boundary, a location between a non-word and a word char:
strings <- c('string one', 'string two', 'string three')
gsub("[[:<:]]", "+", strings, perl=TRUE)
# => [1] "+string +one" "+string +two" "+string +three"
Or, you may use a (\w+)
(that matches and captures into Group 1 any one or more word chars, i.e. letters, digits, or _
) TRE regex to replace with a +
and a replacement backreference \1
to restore the consumed chars in the output:
gsub("(\\w+)", '+\\1', strings)
# => [1] "+string +one" "+string +two" "+string +three"
Note you do not need a word boundary here since the first word char matched will be already at the word boundary and the consequent word chars will be consumed due to +
quantifier. See the regex demo.
And with an ICU regex based str_replace_all
, you may use
> str_replace_all(strings, "\\w+", '+\\0')
[1] "+string +one" "+string +two" "+string +three"
The \\0
is a replacement backreference to the whole match.
Upvotes: 4
Reputation: 22837
You can do this without capture groups as well (as others have shown) by using the regex \b(?=\w)
with perl=T
as shown below.
strings <- c('string one', 'string two', 'string three')
gsub("\\b(?=\\w)", "+", strings, perl=T)
Result
[1] "+string +one" "+string +two" "+string +three"
Upvotes: 1
Reputation: 7174
Another alternative would be to use strsplit()
in combination with paste0()
:
res <- lapply(strsplit(strings, " "), function(x) paste0("+", x))
sapply(res, paste0, collapse = " ")
# [1] "+string +one" "+string +two" "+string +three"
For some people the advantage may be that you don't have to wrestle with a regular expression. However, I would always prefer the direct regex statements by Jasbner and Wictor
Upvotes: 0
Reputation: 2283
Using captured groups is one way of doing this. Group with parenthesis and recall with \\1
.
strings_new <- str_replace_all(strings, "(\\b\\w)", '+\\1')
strings_new
[1] "+string +one" "+string +two" "+string +three"
Upvotes: 7