user1923975
user1923975

Reputation: 1389

Add a character to the start of every word

I have a vector of strings and want to add a + before each word in each string.

strings <- c('string one', 'string two', 'string three')
strings_new <- str_replace_all(strings, "\\b\\w", '+')
string_new

Unfortunately, this is replacing the first character, not adding the + symbol. I'm not too familiar with regex to know how to solve this.

Any help would be great.

Thanks

Upvotes: 2

Views: 1151

Answers (4)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627100

You may use a base R solution using PCRE regex [[:<:]] that matches the starting word boundary, a location between a non-word and a word char:

strings <- c('string one', 'string two', 'string three')
gsub("[[:<:]]", "+", strings, perl=TRUE)
# => [1] "+string +one"   "+string +two"   "+string +three"

Or, you may use a (\w+) (that matches and captures into Group 1 any one or more word chars, i.e. letters, digits, or _) TRE regex to replace with a + and a replacement backreference \1 to restore the consumed chars in the output:

gsub("(\\w+)", '+\\1', strings)
# => [1] "+string +one"   "+string +two"   "+string +three"

Note you do not need a word boundary here since the first word char matched will be already at the word boundary and the consequent word chars will be consumed due to + quantifier. See the regex demo.

And with an ICU regex based str_replace_all, you may use

> str_replace_all(strings, "\\w+", '+\\0')
[1] "+string +one"   "+string +two"   "+string +three"

The \\0 is a replacement backreference to the whole match.

Upvotes: 4

ctwheels
ctwheels

Reputation: 22837

You can do this without capture groups as well (as others have shown) by using the regex \b(?=\w) with perl=T as shown below.

See code in use here

strings <- c('string one', 'string two', 'string three')
gsub("\\b(?=\\w)", "+", strings, perl=T)

Result

[1] "+string +one"   "+string +two"   "+string +three"

Upvotes: 1

KenHBS
KenHBS

Reputation: 7174

Another alternative would be to use strsplit() in combination with paste0():

res <- lapply(strsplit(strings, " "), function(x) paste0("+", x))
sapply(res, paste0, collapse = " ")
# [1] "+string +one"   "+string +two"   "+string +three"

For some people the advantage may be that you don't have to wrestle with a regular expression. However, I would always prefer the direct regex statements by Jasbner and Wictor

Upvotes: 0

jasbner
jasbner

Reputation: 2283

Using captured groups is one way of doing this. Group with parenthesis and recall with \\1.

strings_new <- str_replace_all(strings, "(\\b\\w)", '+\\1')
strings_new
[1] "+string +one"   "+string +two"   "+string +three"

Upvotes: 7

Related Questions