Arima
Arima

Reputation: 195

R regular expressions, trying to capture a group

I've read a few of the other questions on R capture groups in regular expressions and i'm not having much luck.

I have a string:

127.0.0.1 - - [07/Dec/2014:06:43:43 -0800] \"OPTIONS * HTTP/1.0\" 200 - \"-\" \"Apache/2.2.14 (Ubuntu) PHP/5.3.2-1ubuntu4.24 with Suhosin-Patch mod_ssl/2.2.14 OpenSSL/0.9.8k mod_apreq2-20090110/2.7.1 mod_perl/2.0.4 Perl/v5.10.1 (internal dummy connection)\"

From which I am trying to capture a timestamp:

07/Dec/2014:06:43:43 -0800

The following function invocation returns a match:

regmatches(x,regexpr('\\[([\\w:/]+\\s[+\\-]\\d{4})\\]',x,perl=TRUE))
[1] "[07/Dec/2014:06:43:43 -0800]"

I've tried to capture the single group itself with str_match with varying varieties of this regex:

str_match(x, "\\[([\\w:/]+\\s[+\\-]\\d{4})\\]")
     [,1] [,2]
[1,] NA   NA

To no avail. Varying varieties of this regex test correctly in most of the online regex testers so I don't think the regex is the problem.

How can I get just the timestamp itself so I can pump it into strptime, without doing something like gsub the brackets? gsub doesn't work to get the group for me, str_match doesn't work, what am I missing? The ideal output would be

07/Dec/2014:06:43:43 -0800

which I could then use in strptime.

Upvotes: 3

Views: 237

Answers (4)

Sven Hohenstein
Sven Hohenstein

Reputation: 81693

It is very easy with sub. You can replace the whole string with the matching group.

sub(".*\\[([A-z0-9:/]+\\s[+-]\\d{4})\\].*", "\\1", x)
# [1] "07/Dec/2014:06:43:43 -0800"

Upvotes: 1

David Arenburg
David Arenburg

Reputation: 92292

Try the qdapRegex package it has a special method for extracting elements from square brackets

library(qdapRegex)
rm_square(x, extract = TRUE)[[1]]
## [1] "07/Dec/2014:06:43:43 -0800"

Upvotes: 2

Avinash Raj
Avinash Raj

Reputation: 174706

Use \k (\K keeps the text matched so far out of the overall regex match.) and a positive lookahead.

> regmatches(x,regexpr('\\[\\K[\\w:/]+\\s[+\\-]\\d{4}(?=\\])',x,perl=TRUE))
[1] "07/Dec/2014:06:43:43 -0800"

\\K in \\[\\K discards the previously matched [ character.

Upvotes: 3

vks
vks

Reputation: 67968

(?<=\[)([\w:\/]+\s[+\-]\d{4})(?=\])

Try this.See demo.

https://regex101.com/r/tX2bH4/16

Upvotes: 2

Related Questions