jigar_bhalodia
jigar_bhalodia

Reputation: 37

Regex to match all paths which are not in a specific directory

Goal: I want to match all paths which are NOT in elasticmapreduce/j-abc123/node/i-abc123/applications directory

Following are a set of possible paths:

elasticmapreduce/j-abc123/node/i-abc123/applications/hadoop-yarn/hadoop-yarn-proxyserver-ip.log.2020-05-07-00.gz
elasticmapreduce/j-abc123/node/i-abc123/applications/hadoop-yarn/hadoop-yarn-timelineserver-ip.out.gz
elasticmapreduce/j-abc123/node/i-abc123/applications/hadoop-yarn/hadoop-yarn-proxyserver-ip.log.gz
elasticmapreduce/j-abc123/node/i-abc123/applications/hive/user/hive/hive.log.2020-05-07.gz
elasticmapreduce/j-abc123/node/i-abc123/applications
elasticmapreduce/j-abc123/node/i-abc123/bootstrap-actions/master.log.2020-05-07-00.gz
elasticmapreduce/j-abc123/node/i-abc123/bootstrap-actions
elasticmapreduce/j-abc123/node/i-abc123/daemons/instance-state/instance-state.log-2020-05-08-13-30.gz
elasticmapreduce/j-abc123/node/i-abc123/daemons/setup-dns.log.gz
elasticmapreduce/j-abc123/node/i-abc123/provision-node/abc123/stderr.gz
elasticmapreduce/j-abc123/node/i-abc123/provision-node/apps-phase/0/abc123/stderr.gz
elasticmapreduce/j-abc123/node/i-abc123/provision-node/reports/0/abc123/ip.ec2.internal/201805270306.yaml.gz
elasticmapreduce/j-abc123/node/i-abc123/setup-devices/setup_var_log_dir.log.gz

Following regex matches all paths containing elasticmapreduce/j-abc123/node/i-abc123/applications:

^elasticmapreduce\/j-.*\/node\/i-.*\/(applications(\/.*)*)$

I want to match all paths which were NOT matched by above regex pattern.

Why doesn't the following regex do this?

^elasticmapreduce\/j-.*\/node\/i-.*\/(?!(applications(\/.*)*))$

PS, I use https://regex101.com/ to test regex patterns.

Upvotes: 0

Views: 39

Answers (1)

The fourth bird
The fourth bird

Reputation: 163207

The pattern that you tried does not work as you intended, as it will match until the last occurrence of a / and then has to fulfill this part (?!(applications(\/.*)*))$

The part asserts what is directly to the right is not applications followed by 0 or more repetitions of / followed by any char. Then assert the end of the string.

It starts backtracking and can not match in any of the examples.

I think it shows better when you omit the $ and see where the match ends:

https://regex101.com/r/aXV8vO/1


As you are not matching a part that contains a a forward slash after j- and i-, you could make use of a negated character class instead [^\/]+ matching any char except a forward slash.

Then use the negative lookahead \/(?!applications\b) right after matching the forward slash.

^elasticmapreduce\/j-[^\/]+\/node\/i-[^\/]+\/(?!applications\b)[^\/]*(?:\/.*)?$

Regex demo

Note If you don't want to cross newlines, you could use [^\/\r\n]+ instead.

Upvotes: 1

Related Questions