Reputation: 109
I wrote regex in https://regex101.com/r/R8ObNk/1 (^[^\\]*)\\t([^\\]*)\\t([^\\]*)\\t([^\\]*)\\t([^\\]*)(.*)
with a back reference to capture group 5 or "\5".
For some reason, when I try to use the regex above that I wrote in R using gsub, I am not returning the correct data.
Here is the dput for first line of the data that I am trying to back reference:
structure(list(value = "19-22\t\t4\tP,G\tDOB_TT\t\tTime of Birth\t\t126\t \t0000-2359 Time of Birth"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -1L))
This is the gsub on the line above: gsub(pattern = "(^[^\\]*)\\t([^\\]*)\\t([^\\]*)\\t([^\\]*)\\t([^\\]*)(.*)", replacement = "\\5", x = a$value)
. I do know you're supposed to add another "\" when working with regex in R, but still that didn't work.
The intended result of the gsub should be "DOB_TT" or the 5th capture group
Upvotes: 1
Views: 209
Reputation: 57686
You don't actually need regexes in this case, since your data is structured:
parsed <- read.delim(text=a$value, header=FALSE)
parsed$V5
# [1] "DOB_TT"
Upvotes: 3
Reputation: 206253
You need to be careful with escape characters. Note that R uses extra "" in strings that will not be understood by the website. And when you see a string like
x <- "a\tb"
in R, there is no literal slash in the string. The \t
is the escape for a tab character. So nchar(x)
return 3, not 4 because those two values together make one tab character. So given your data, what you really want is
gsub(pattern = "(^[^\t]*)\t([^\t]*)\t([^\t]*)\t([^\t]*)\t([^\t]*)(.*)",
replacement = "\\5", x = a$value)
You do not need extra \
for the tabs because tab characters aren't special in a regular expression. They are just regular characters.
Upvotes: 2