Shreya Agarwal
Shreya Agarwal

Reputation: 716

Extract text after symbol and first non-word char in r

How should I extract text between first @ and space? I have mentioned my code below. It extracts all the words after all @, but I just want the text first the @.

text

@pisa, what's up?
@italy @spain we're praying for you.

ideal result

pisa
italy

my code

regex <- "@([A-Za-z]+[A-Za-z0-9])"

words <- str_extract_all(text, regex)

print(words)

output

@pisa
@italy @spain 

Upvotes: 1

Views: 112

Answers (1)

akrun
akrun

Reputation: 887028

We can use a regex lookaround as regex i.e. to match the word followed by the symbol @

library(stringr)
str_extract(text, "(?<=@)\\w+")
#[1] "pisa"  "italy"

It can be also

str_extract(text, "(?<=@)[^, ]+")
#[1] "pisa"  "italy"

Or in base R, using sub, capture the word after the @ and in the replacement specify the backreference (\\1) of the captured group

sub("^@(\\w+).*", "\\1", text)
#[1] "pisa"  "italy"

Also, another option is regmatches/regexpr

regmatches(text, regexpr('(?<=@)\\w+', text, perl = TRUE))
#[1] "pisa"  "italy"

Or with trimws

trimws(text, whitespace = '@|,? .*')
#[1] "pisa"  "italy"

data

text <- c("@pisa, what's up?", "@italy @spain we're praying for you.")

Upvotes: 1

Related Questions