Marcin
Marcin

Reputation: 8044

Using regexes in grep function in R

Could anyone maybe know how to extract x and y from this character: "x and y" using grep function (not using stringi package) if x and y are random characters? I am so not skilled in regular expressions. Thanks for any response.

Upvotes: 2

Views: 151

Answers (2)

hwnd
hwnd

Reputation: 70732

As @MrFlick commented, grep is not the right function to extract these substrings.

You can use regmatches and do something like this:

> x <- c('x and y', 'abc and def', 'foo and bar')
> regmatches(x, gregexpr('and(*SKIP)(*F)|\\w+', x, perl=T))
# [[1]]
# [1] "x" "y"

# [[2]]
# [1] "abc" "def"

# [[3]]
# [1] "foo" "bar"

Or if " and " is always constant, then use strsplit as suggested in the comments.

> x <- c('x and y', 'abc and def', 'foo and bar')
> strsplit(x, ' and ', fixed=T)
# [[1]]
# [1] "x" "y"

# [[2]]
# [1] "abc" "def"

# [[3]]
# [1] "foo" "bar"

Upvotes: 4

hrbrmstr
hrbrmstr

Reputation: 78792

The regex here matches any chars "and" chars and then extracts them with regmatches:

txt <- c("x and y", "a and  b", " C and d", "qq and rr")

matches <- regexec("([[:alpha:]]+)[[:blank:]]+and[[:blank:]]+([[:alpha:]]+)", txt)

regmatches(txt, matches)[[1]][2:3]
## [1] "x" "y"

regmatches(txt, matches)[[2]][2:3]
## [1] "a" "b"

regmatches(txt, matches)[[3]][2:3]
## [1] "C" "d"

regmatches(txt, matches)[[4]][2:3]
## [1] "qq" "rr"

([[:alpha:]]+) matches one or more alpha characters and places it in a match group. [[:blank:]]+ matches one or more "whitespace" characters. There are less verbose ways to write these regexes but the expanded ones (to me) help make it easier to grok if there will be folks reading the code that aren't familiar with regexes.

I also didn't need to call regmatches 4x, but it was faster to cut/paste for a toy example.

Upvotes: 4

Related Questions