Reputation: 1341
As a basic example consider the following data.frame:
df <- data.frame(
colval = c(
"line-01_tel=0000000001",
"line-01_tel=0000000002",
"line-01_tel=0000000003"
)
)
Let's imagine that "0000000001", "0000000002", "0000000003" are telephon numbers that we want to extract by using named group capture. Using Python here is how I would poceed:
import re
def main():
test_lst = [
"line-01_tel=0000000001",
"line-01_tel=0000000002",
"line-01_tel=0000000003"
]
regexp = r"=(?P<telnum>\d+)$"
prog = re.compile(regexp, re.IGNORECASE)
for item in test_lst:
result = prog.search(item)
if result:
print("telnum = {}".format(result.group("telnum")))
if __name__ == "__main__":
main()
Is it possible to have the equivalent of r"=(?P<telnum>\d+)$"
and result.group("telnum")
indicated in the above code in R? In other words, is there any named group capture mechanism in R while dealing with regular expressions?
I checked the Strings chapter of the online book "R for data science". There are functions such as str_match, str_sub
, etc for working with regular expressions. But I didn't see any example of named group capture.
Upvotes: 1
Views: 65
Reputation: 269852
The namedCapture package has that capability.
library(namedCapture)
str_match_named(df$colval, "(?P<telnum>\\d+)$")
## telnum
## [1,] "0000000001"
## [2,] "0000000002"
## [3,] "0000000003"
Also even without that package this works n base R
m <- regexec("(?P<telnum>\\d+)$", df$colval, perl = TRUE)
regmatches(df$colval, m)
## [[1]]
## telnum
## "0000000001" "0000000001"
##
## [[2]]
## telnum
## "0000000002" "0000000002"
##
## [[3]]
## telnum
## "0000000003" "0000000003"
Upvotes: 4