user17911
user17911

Reputation: 1341

Is there any named group capture mechanism in R while dealing with regular expressions?

As a basic example consider the following data.frame:

df <- data.frame(
    colval = c(
        "line-01_tel=0000000001",
        "line-01_tel=0000000002",
        "line-01_tel=0000000003"
    )
)

Let's imagine that "0000000001", "0000000002", "0000000003" are telephon numbers that we want to extract by using named group capture. Using Python here is how I would poceed:

import re


def main():
    test_lst = [
        "line-01_tel=0000000001",
        "line-01_tel=0000000002",
        "line-01_tel=0000000003"
    ]
    regexp = r"=(?P<telnum>\d+)$"
    prog = re.compile(regexp, re.IGNORECASE)
    for item in test_lst:
        result = prog.search(item)
        if result:
            print("telnum = {}".format(result.group("telnum")))


if __name__ == "__main__":
    main()

Is it possible to have the equivalent of r"=(?P<telnum>\d+)$" and result.group("telnum") indicated in the above code in R? In other words, is there any named group capture mechanism in R while dealing with regular expressions?

I checked the Strings chapter of the online book "R for data science". There are functions such as str_match, str_sub, etc for working with regular expressions. But I didn't see any example of named group capture.

Upvotes: 1

Views: 65

Answers (1)

G. Grothendieck
G. Grothendieck

Reputation: 269852

The namedCapture package has that capability.

library(namedCapture)
str_match_named(df$colval, "(?P<telnum>\\d+)$")
##      telnum      
## [1,] "0000000001"
## [2,] "0000000002"
## [3,] "0000000003"

Also even without that package this works n base R

m <- regexec("(?P<telnum>\\d+)$", df$colval, perl = TRUE)
regmatches(df$colval, m)
## [[1]]
##                    telnum 
## "0000000001" "0000000001" 
##
## [[2]]
##                    telnum 
## "0000000002" "0000000002" 
##
## [[3]]
##                    telnum 
## "0000000003" "0000000003" 

Upvotes: 4

Related Questions