user3349993
user3349993

Reputation: 309

Regex in R: matching the string before a sequence of characters

I want to extract a part of the string that comes before a certain word. E.g. I want to get everything before ", useless".

a <- "Experiment A, useless (03/25)"
b <- grep('^[^useless]+', a, perl=T)
regmatches(a,b)

This should return "Experiment A".

However, this doesn't work. R gives "Error in substring(x[ind], so, eo) : invalid substring arguments".

Upvotes: 2

Views: 4853

Answers (4)

Shenglin Chen
Shenglin Chen

Reputation: 4554

sub('(.*),.*','\\1', a, perl=T)
[1] "Experiment A"

Upvotes: 0

Liun
Liun

Reputation: 117

sub("(\\w*), useless.*","\\1",a)

Upvotes: 1

Matthew Lundberg
Matthew Lundberg

Reputation: 42669

Lookahead is made for this:

b <- regexpr(".*(?=, useless)", a, perl=TRUE)
regmatches(a, b)
## [1] "Experiment A"

.* matches any sequence of characters, but the lookahead (?=, useless) says that it only matches text that is followed by the string ", useless".

Upvotes: 3

akrun
akrun

Reputation: 887241

We can use sub to match the , followed by zero or more spaces (\\s*) followed by 'useless' and other characters that follow (.*) and replace it with blank ("")

sub(",\\s*useless\\b.*", "", a)
#[1] "Experiment A"

Upvotes: 4

Related Questions