Stef
Stef

Reputation: 3

R regex match beginning and middle of a string

I have a vector of strings:

A <- c("Hello world", "Green 44", "Hot Beer", "Bip 6t")

I want to add an asterisk (*) at the beginning and at the end of every first word like this:

"*Hello* world", "*Green* 44", "*Hot* Beer", "*Bip* 6t"

Make sense to use str_replace() from stringr. However, I am struggling with regex to match the first word of each string.

The best achievement ended up with:

str_replace(A, "^([A-Z])", "*\\1*"))
"*H*ello world", "*G*reen 44", "*H*ot Beer", "*B*ip 6t"

I might expect to be a straightforward task, but I am not getting along with regex.

Thanks!

Upvotes: 0

Views: 1200

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626845

You can use

sub("([[:alpha:]]+)", "*\\1*", A)
## => [1] "*Hello* world" "*Green* 44"    "*Hot* Beer"    "*Bip* 6t"     

The stringr equivalent is

library(stringr)
stringr::str_replace(A, "([[:alpha:]]+)", "*\\1*")
stringr::str_replace(A, "(\\p{L}+)", "*\\1*")

See the R demo online. See the regex demo online.

The ([[:alpha:]]+) regex matches and captures one or more letters into Group 1 and *\1* replacement replaces the match with * + Group 1 value + *.

Note that sub finds and replaces the first match only, so only the first word is affected in each character vector.

Notes

  • If you plan to wrap the word exactly at the start of a string (not just the "first word"), add ^ at the start of the pattern (e.g. sub("^([[:alpha:]]+)", "*\\1*", A))
  • If the word is a chunk of non-whitespace chars, use \S+ instead of [[:alpha:]]+ or \p{L}+ (e.g. sub("^(\\S+)", "*\\1*", A))
  • If the word is any chunk of letters or digits or underscores, you can use \w+, i.e. sub("^(\\w+)", "*\\1*", A)
  • If the word is any chunk of letters or digits but not underscores, you can use [[:alnum:]]+, i.e. sub("^([[:alnum:]]+)", "*\\1*", A)

Upvotes: 2

Allan Cameron
Allan Cameron

Reputation: 173813

You were almost there

str_replace(A, "(^.*) ", "*\\1* ")
#> [1] "*Hello* world" "*Green* 44"    "*Hot* Beer"    "*Bip* 6t" 

Upvotes: 1

Related Questions