Emmet Brown
Emmet Brown

Reputation: 456

Regex matches processing in R

I would like to extract the 2 matching groups using R. Right now I've got this, but is not working well:

Code:

str = '123abc'
vector <- gregexpr('(?<first>\\d+)(?<second>\\w+)', str, perl=TRUE)
regmatches(str, vector)

Result:

[[1]]
[1] "123abc"

I want the result to be something like this:

[1] "123"
[2] "abc"

Upvotes: 0

Views: 130

Answers (4)

G. Grothendieck
G. Grothendieck

Reputation: 270298

Try this:

> library(gsubfn)
> strapplyc("123abc", '(\\d+)(\\w+)')[[1]]
[1] "123" "abc"

Upvotes: 0

Matthew Lundberg
Matthew Lundberg

Reputation: 42689

I've renamed your string s to avoid clobbering str. Here is one approach:

library(stringr)
s <- '123abc'
reg <- '([[:digit:]]+)([[:alpha:]]+)'

complete <- unlist(str_extract_all(s, reg))
partials <- unlist(str_match_all(s, reg))
partials <- partials[!(partials %in% complete)]

partials
[1] "123" "abc"

Upvotes: 1

Florian
Florian

Reputation: 91

I'm not sure if you have a specific reason for using regmatches, unless you are e.g. importing the expressions in that format. If well-defined groups are common to all your entries, you can match them in this way:

x <- "123abc"
sub("([[:digit:]]+)[[:alpha:]]+","\\1",x)
sub("[[:digit:]]+([[:alpha:]]+)","\\1",x)

Result

[1] "123"
[1] "abc"

I.e., match the entire structure of the string, then replace it with the part you want to retain by enclosing it in round brackets and referring to it with a backreference ("\\1").

Upvotes: 2

Erik Shilts
Erik Shilts

Reputation: 4509

Depending on how well structured your inputs are, you may want to use strsplit to split the string.

Documentation here.

Upvotes: 0

Related Questions