Ben Foster
Ben Foster

Reputation: 34800

Relative Path Regular Expression

Our web application allows users to specify their own "slugs" which can include relative paths e.g. /somedir/some-file.htm.

In our routing configuration we need to ensure that only valid slugs (with segments) are supported.

The regex I am using is:

(^[a-z0-9])([a-z0-9-/]+)([a-z0-9])$

This means:

Unfortunately it also means that double slashes will match e.g. somedir//subdir//some-file.htm because my expression is allowing one or more slashes.

How can I change it to allow zero or more slashes between segments.

I thought that:

(^[a-z0-9])(/?[a-z0-9-]+/?)([a-z0-9])$

would work but it does not.

Upvotes: 6

Views: 19088

Answers (4)

Daniel Tonon
Daniel Tonon

Reputation: 10432

My requirements were very different so I'm answering for others that are coming here looking for answers rather than answering the exact question and requirements posted here.

My requirements:

  • It needed to match against multiple paths inside the same string
  • Each path was known to start with either ./ or ../
  • It could not use look behinds because they were not supported
  • It was known that each path would be on a new line
  • Folders would always be separated using single / characters.

This is what I came up with (based on JS RegEx syntax):

My solution

/\.\.?\/[^\n"?:*<>|]+\.[A-z0-9]+/g

I'll explain it using this as an example:

../path/to/file.ext lorem ipsum text

  • /..../g means the regex will match multiple times inside the same string
  • \.\.?\/ matches both ../ and ./

    /\.\.?\//g

    ../path/to/file.ext lorem ipsum text

  • [^\n"?:*<>|]+ is a black list of characters that do not match.
    • [^....] = do not match against this list of characters.
    • \n = new line (paths would never appear on the same line)
    • all the other characters are literal. They are illegal file name characters.

      /\.\.?\/[^\n"?:*<>|]+/g

      ../path/to/file.ext lorem ipsum text

  • \.[A-z0-9]+ is to make sure that it stops at the end of the file extension.

    /\.\.?\/[^\n"?:*<>|]+\.[A-z0-9]+/g

    ../path/to/file.ext lorem ipsum text

Upvotes: 0

sangress
sangress

Reputation: 609

Check for valid path (relative or absolute. the dot is for hidden folders):
^([a-z]:)*(\/*(\.*[a-z0-9]+\/)*(\.*[a-z0-9]+))

Upvotes: 0

OmnipotentEntity
OmnipotentEntity

Reputation: 17131

^[a-z0-9]([a-z0-9-]*[a-z0-9])?(/[a-z0-9]([a-z0-9-]*[a-z0-9])?)*$

EDIT: Use this one if you like the first regex:

^(?!-)[a-z0-9-]+(?<!-)(/(?!-)[a-z0-9-]+(?<!-))*$

It looks messy and complicated, but it seems to be correct per your spec.

[a-z0-9]([a-z0-9-]*[a-z0-9])?

Matches a single name. Ignoring /s for the moment.

Then the rest of it is a single slash followed by that same thing again.

As mentioned in Karoly's answer, this does not include literal periods, for instance "some-file.htm" will not match the regex I wrote.

If this is desired behavior then you'll actually want:

^[a-z0-9]([a-z0-9-\.]*[a-z0-9])?(/[a-z0-9]([a-z0-9-\.]*[a-z0-9])?)*$

Finally, if you want to allow literal periods in only the last section then you'll want:

^[a-z0-9]([a-z0-9-]*[a-z0-9])?(/[a-z0-9]([a-z0-9-]*[a-z0-9])?)*(/[a-z0-9]([a-z0-9-\.]*[a-z0-9])?)?$

EDIT:

A thought occurs that this can be simplified a bit using lookaheads and behinds.

^[a-z0-9]([a-z0-9-]*[a-z0-9])?(/[a-z0-9]([a-z0-9-]*[a-z0-9])?)*(/[a-z0-9]([a-z0-9-\.]*[a-z0-9])?)?$

becomes:

^(?!-)[a-z0-9-]+(?<!-)(/(?!-)[a-z0-9-]+(?<!-))*(/(?!-\.)[a-z0-9-\.]+(?<!-\.))?$

Upvotes: 6

Karoly Horvath
Karoly Horvath

Reputation: 96258

(^[a-z0-9]+)(/[a-z0-9-]+)*([a-z0-9])$

note: I don't see . in your regexp.

Personally I would separately test the first and last characters, that makes the regexp a lot simpler and more usable.

Upvotes: 1

Related Questions