Reputation: 195
I've read a few of the other questions on R capture groups in regular expressions and i'm not having much luck.
I have a string:
127.0.0.1 - - [07/Dec/2014:06:43:43 -0800] \"OPTIONS * HTTP/1.0\" 200 - \"-\" \"Apache/2.2.14 (Ubuntu) PHP/5.3.2-1ubuntu4.24 with Suhosin-Patch mod_ssl/2.2.14 OpenSSL/0.9.8k mod_apreq2-20090110/2.7.1 mod_perl/2.0.4 Perl/v5.10.1 (internal dummy connection)\"
From which I am trying to capture a timestamp:
07/Dec/2014:06:43:43 -0800
The following function invocation returns a match:
regmatches(x,regexpr('\\[([\\w:/]+\\s[+\\-]\\d{4})\\]',x,perl=TRUE))
[1] "[07/Dec/2014:06:43:43 -0800]"
I've tried to capture the single group itself with str_match with varying varieties of this regex:
str_match(x, "\\[([\\w:/]+\\s[+\\-]\\d{4})\\]")
[,1] [,2]
[1,] NA NA
To no avail. Varying varieties of this regex test correctly in most of the online regex testers so I don't think the regex is the problem.
How can I get just the timestamp itself so I can pump it into strptime, without doing something like gsub
the brackets? gsub doesn't work to get the group for me, str_match doesn't work, what am I missing? The ideal output would be
07/Dec/2014:06:43:43 -0800
which I could then use in strptime.
Upvotes: 3
Views: 237
Reputation: 81693
It is very easy with sub
. You can replace the whole string with the matching group.
sub(".*\\[([A-z0-9:/]+\\s[+-]\\d{4})\\].*", "\\1", x)
# [1] "07/Dec/2014:06:43:43 -0800"
Upvotes: 1
Reputation: 92292
Try the qdapRegex
package it has a special method for extracting elements from square brackets
library(qdapRegex)
rm_square(x, extract = TRUE)[[1]]
## [1] "07/Dec/2014:06:43:43 -0800"
Upvotes: 2
Reputation: 174706
Use \k
(\K
keeps the text matched so far out of the overall regex match.) and a positive lookahead.
> regmatches(x,regexpr('\\[\\K[\\w:/]+\\s[+\\-]\\d{4}(?=\\])',x,perl=TRUE))
[1] "07/Dec/2014:06:43:43 -0800"
\\K
in \\[\\K
discards the previously matched [
character.
Upvotes: 3
Reputation: 67968
(?<=\[)([\w:\/]+\s[+\-]\d{4})(?=\])
Try this.See demo.
https://regex101.com/r/tX2bH4/16
Upvotes: 2