Use Regular Expressions to find URLs without certain word patterns

Question

I am trying to write a Regular Expression that can match URLs that don't have a certain pattern. The URLs I am trying to filter out shouldn't have an ID in them, which is 40 Hex uppercase characters.

For example, If I have the following URLs:

/dev/api/appid/A1B2C3D4E5A1B2C3D4E5A1B2C3D4E5A1B2C3D4E5/users

/dev/api/apps/list

/dev/api/help/apps/applicationname/apple/osversion/list/

(urls are made up, but the idea is that there are some endpoints with 40-length IDs, and some endpoints that don't, and some endpoints that are really long in total characters)

I want to make sure that the regular expression is only able to match the last 2 URLs, and not the first one.

I wrote the following regex,

\S+(?:[0-9A-F]{40})\S+

and it matches endpoints that do have the long ID in them, but skips over the ones that should be filtered. If I try to negate the regex,

\S+(?![0-9A-F]{40})\S+

It matches all endpoints, because some URLs have lengths that are greater than what the ID should be (40 characters).

How can I use a regular expression to filter out exactly the URLs I need?

Gurmanjot Singh · Accepted Answer

Try this regex:

^(?!.*\/[0-9A-F]{40}\/).*$

Click for Demo

Explanation:

^ - asserts the start of the string/url
(?!.*\/[0-9A-F]{40}\/) - Negative Lookahead to check for the presence of a / followed by exactly 40 HEX characters followed by / somewhere in the string. Since, it is a negative lookahead, any string/url containing this pattern will not be matched.
.* - matches 0+ occurrences of any character except a newline character
$ - asserts the end of the string

Use Regular Expressions to find URLs without certain word patterns

Answers (2)

Related Questions