Reputation: 34800
Our web application allows users to specify their own "slugs" which can include relative paths e.g. /somedir/some-file.htm.
In our routing configuration we need to ensure that only valid slugs (with segments) are supported.
The regex I am using is:
(^[a-z0-9])([a-z0-9-/]+)([a-z0-9])$
This means:
Unfortunately it also means that double slashes will match e.g. somedir//subdir//some-file.htm because my expression is allowing one or more slashes.
How can I change it to allow zero or more slashes between segments.
I thought that:
(^[a-z0-9])(/?[a-z0-9-]+/?)([a-z0-9])$
would work but it does not.
Upvotes: 6
Views: 19088
Reputation: 10432
My requirements were very different so I'm answering for others that are coming here looking for answers rather than answering the exact question and requirements posted here.
My requirements:
./
or ../
/
characters.This is what I came up with (based on JS RegEx syntax):
My solution
/\.\.?\/[^\n"?:*<>|]+\.[A-z0-9]+/g
I'll explain it using this as an example:
../path/to/file.ext lorem ipsum text
/..../g
means the regex will match multiple times inside the same string\.\.?\/
matches both ../
and ./
/\.\.?\//g
../
path/to/file.ext lorem ipsum text
[^\n"?:*<>|]+
is a black list of characters that do not match.
[^....]
= do not match against this list of characters.\n
= new line (paths would never appear on the same line)
/\.\.?\/[^\n"?:*<>|]+/g
../path/to/file.ext lorem ipsum text
\.[A-z0-9]+
is to make sure that it stops at the end of the file extension.
/\.\.?\/[^\n"?:*<>|]+\.[A-z0-9]+/g
../path/to/file.ext
lorem ipsum text
Upvotes: 0
Reputation: 609
Check for valid path (relative or absolute. the dot is for hidden folders):
^([a-z]:)*(\/*(\.*[a-z0-9]+\/)*(\.*[a-z0-9]+))
Upvotes: 0
Reputation: 17131
^[a-z0-9]([a-z0-9-]*[a-z0-9])?(/[a-z0-9]([a-z0-9-]*[a-z0-9])?)*$
EDIT: Use this one if you like the first regex:
^(?!-)[a-z0-9-]+(?<!-)(/(?!-)[a-z0-9-]+(?<!-))*$
It looks messy and complicated, but it seems to be correct per your spec.
[a-z0-9]([a-z0-9-]*[a-z0-9])?
Matches a single name. Ignoring /
s for the moment.
Then the rest of it is a single slash followed by that same thing again.
As mentioned in Karoly's answer, this does not include literal periods, for instance "some-file.htm" will not match the regex I wrote.
If this is desired behavior then you'll actually want:
^[a-z0-9]([a-z0-9-\.]*[a-z0-9])?(/[a-z0-9]([a-z0-9-\.]*[a-z0-9])?)*$
Finally, if you want to allow literal periods in only the last section then you'll want:
^[a-z0-9]([a-z0-9-]*[a-z0-9])?(/[a-z0-9]([a-z0-9-]*[a-z0-9])?)*(/[a-z0-9]([a-z0-9-\.]*[a-z0-9])?)?$
EDIT:
A thought occurs that this can be simplified a bit using lookaheads and behinds.
^[a-z0-9]([a-z0-9-]*[a-z0-9])?(/[a-z0-9]([a-z0-9-]*[a-z0-9])?)*(/[a-z0-9]([a-z0-9-\.]*[a-z0-9])?)?$
becomes:
^(?!-)[a-z0-9-]+(?<!-)(/(?!-)[a-z0-9-]+(?<!-))*(/(?!-\.)[a-z0-9-\.]+(?<!-\.))?$
Upvotes: 6
Reputation: 96258
(^[a-z0-9]+)(/[a-z0-9-]+)*([a-z0-9])$
note: I don't see .
in your regexp.
Personally I would separately test the first and last characters, that makes the regexp a lot simpler and more usable.
Upvotes: 1