Reputation: 1343
I am trying to write a Regular Expression that can match URLs that don't have a certain pattern. The URLs I am trying to filter out shouldn't have an ID in them, which is 40 Hex uppercase characters.
For example, If I have the following URLs:
/dev/api/appid/A1B2C3D4E5A1B2C3D4E5A1B2C3D4E5A1B2C3D4E5/users
/dev/api/apps/list
/dev/api/help/apps/applicationname/apple/osversion/list/
(urls are made up, but the idea is that there are some endpoints with 40-length IDs, and some endpoints that don't, and some endpoints that are really long in total characters)
I want to make sure that the regular expression is only able to match the last 2 URLs, and not the first one.
I wrote the following regex,
\S+(?:[0-9A-F]{40})\S+
and it matches endpoints that do have the long ID in them, but skips over the ones that should be filtered. If I try to negate the regex,
\S+(?![0-9A-F]{40})\S+
It matches all endpoints, because some URLs have lengths that are greater than what the ID should be (40 characters).
How can I use a regular expression to filter out exactly the URLs I need?
Upvotes: 0
Views: 630
Reputation: 10360
Try this regex:
^(?!.*\/[0-9A-F]{40}\/).*$
Explanation:
^
- asserts the start of the string/url(?!.*\/[0-9A-F]{40}\/)
- Negative Lookahead to check for the presence of a /
followed by exactly 40 HEX characters followed by /
somewhere in the string. Since, it is a negative lookahead, any string/url containing this pattern will not be matched..*
- matches 0+ occurrences of any character except a newline character$
- asserts the end of the stringUpvotes: 1
Reputation: 2991
^((?![A-F0-9]{40}).)*$
Uses a negative lookahead to match any line that doesn't have 40 hex digits in a row. Try it here.
Upvotes: 1