Reputation: 69
I have a regular expression that uses numbered capture groups:
\\b${JOB_SEARCH_RESULTS_RANGE_KEY}\\s+((\\d+)-(\\d+)|\\*)/(\\d+|\\*)
That will parse a Content-Header:
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Range
Currently, this regular expression will allow a *
on either side of range or size. I want to catch if there is a *
on both sides.
How can I do this?
I'm very new to regex. Any help is greatly appreciated.
Upvotes: 0
Views: 129
Reputation: 841
I think you're trying to change the regular expression in such a way that it won't match the header containing*/*
. The problem, however, is that your current expression matches all of these situations:
Content-Range: <unit> <range-start>-<range-end>/<size>
Content-Range: <unit> <range-start>-<range-end>/*
Content-Range: <unit> */<size>
Content-Range: <unit> */*
I can think of 5 ways to expand your regular expression that will only match the first three cases.
(?=)
\\b${JOB_SEARCH_RESULTS_RANGE_KEY}\\s+((\\d+)-(\\d+)|\*(?=/\\d))/(\\d+|\\*)
It only matches *
if it's followed by /
and a number
(?!)
\\b${JOB_SEARCH_RESULTS_RANGE_KEY}\\s+((\\d+)-(\\d+)|\\*(?!/\\*))/(\\d+|\\*)
It only matches *
if it's not followed by /*
(?<=)
\\b${JOB_SEARCH_RESULTS_RANGE_KEY}\\s+((\\d+)-(\\d+)|\\*)/(\\d+|(?<=\\d/)\\*)
It only matches *
if it's preceded by a number and /
(?<!)
\\b${JOB_SEARCH_RESULTS_RANGE_KEY}\\s+((\\d+)-(\\d+)|\\*)/(\\d+|(?<!\\*/)\\*)
It only matches *
if it's not preceded by */
This page explains the so-called lookaround assertions in more detail: https://www.regular-expressions.info/lookaround.html
(|)
\\b${JOB_SEARCH_RESULTS_RANGE_KEY}\\s+(((\\d+)-(\\d+)|\\*)/(\\d+)|((\\d+)-(\\d+))/(\\d+|\\*))
To make this explanation more readable, let's take r
for range and s
for size where none of them are *
. It takes the form of ((r or *)/s or (r/(s or *))
and it simply takes out the possibility to match */*
.
My lookaround examples (1-4) are pretty similar and you can choose any one of them. However, they are not fool-proof. They just check the nearest character on the other side of the slash (/
) and therefore make the assumption that there is no malicious input such as Content-Range: bytes *1/*
. You can expand the expression to catch these situations as well, but then you would be better of with the "or" example that I gave as it will be shorter, easier to read, and perhaps even faster in execution. The "or" example is just one of many and perhaps someone else can come up with an even shorter expression. My advice would be to choose the expression that looks the easiest to understand.
One more alternative that I did not list is to keep the original regular expression and make sure the string does not contain */*
using a simple equals statement. It's perhaps the easiest solution to read.
Upvotes: 0