Helen
Helen

Reputation: 607

Regex in R: extract words from a string

I have a string that I want to extract the names from, but I cannot seem to get everything right:

str = "JaMes + Heather + Lynn + log(Barry) + Sister2"
str_list = strsplit(x=str, split="\\+")

I do not wish "log(Barry)" as output, rather just "Barry".

Upvotes: 1

Views: 65

Answers (3)

IceCreamToucan
IceCreamToucan

Reputation: 28685

You can take anything like 'function_name(object)' and convert it to just 'object' with gsub. After that, splitting on ' + ' will give the desired output.

strsplit(gsub('\\w+\\((.*)\\)', '\\1', str), ' + ', fixed = T)[[1]]
# [1] "JaMes"   "Heather" "Lynn"    "Barry"   "Sister2"

Upvotes: 2

akrun
akrun

Reputation: 887118

An option is to remove the log and the brackets with gsub/sub

gsub('log\\(|\\)', '', str)
#[1] "JaMes + Heather + Lynn + Barry + Sister2"

or with sub

sub('log\\(([^)]+)\\)', '\\1', str)
#[1] "JaMes + Heather + Lynn + Barry + Sister2"

Or with regexpr/regmatches, we can extract only the word

setdiff(regmatches(str, gregexpr('\\w+', str))[[1]], "log")
#[1] "JaMes"   "Heather" "Lynn"    "Barry"   "Sister2"

If we need the invididual words

library(stringr)
setdiff(str_extract_all(str, "\\w+")[[1]], "log")
#[1] "JaMes"   "Heather" "Lynn"    "Barry"   "Sister2"

Or use a regex lookaround

str_extract_all(str, "\\w+\\b(?!\\()")[[1]]
[1] "JaMes"   "Heather" "Lynn"    "Barry"   "Sister2"

Upvotes: 2

Roman
Roman

Reputation: 17648

You can use

library(stringi)
stri_extract_all_words(gsub("log", "", str))[[1]]
[1] "JaMes"   "Heather" "Lynn"    "Barry"   "Sister2"

Upvotes: 2

Related Questions