TheSoftwareJedi
TheSoftwareJedi

Reputation: 35246

Need a regex to exclude certain strings

I'm trying to get a regex that will match:

somefile_1.txt
somefile_2.txt
somefile_{anything}.txt

but not match:

somefile_16.txt

I tried

somefile_[^(16)].txt

with no luck (it includes even the "16" record)

Upvotes: 8

Views: 37849

Answers (6)

phihag
phihag

Reputation: 288280

Some regex libraries allow lookahead:

somefile(?!16\.txt$).*?\.txt

Otherwise, you can still use multiple character classes:

somefile([^1].|1[^6]|.|.{3,})\.txt

or, to achieve maximum portability:

somefile([^1].|1[^6]|.|....*)\.txt

[^(16)] means: Match any character but braces, 1, and 6.

Upvotes: 12

Piotr Lesnicki
Piotr Lesnicki

Reputation: 9740

To obey strictly to your specification and be picky, you should rather use:

^somefile_(?!16\.txt$).*\.txt$

so that somefile_1666.txt which is {anything} can be matched ;)

but sometimes it is just more readable to use...:

ls | grep -e 'somefile_.*\.txt' | grep -v -e 'somefile_16\.txt'

Upvotes: 4

Douglas Mayle
Douglas Mayle

Reputation: 21755

The best solution has already been mentioned:

somefile_(?!16\.txt$).*\.txt

This works, and is greedy enough to take anything coming at it on the same line. If you know, however, that you want a valid file name, I'd suggest also limiting invalid characters:

somefile_(?!16)[^?%*:|"<>]*\.txt

If you're working with a regex engine that does not support lookahead, you'll have to consider how to make up that !16. You can split files into two groups, those that start with 1, and aren't followed by 6, and those that start with anything else:

somefile_(1[^6]|[^1]).*\.txt

If you want to allow somefile_16_stuff.txt but NOT somefile_16.txt, these regexes above are not enough. You'll need to set your limit differently:

somefile_(16.|1[^6]|[^1]).*\.txt

Combine this all, and you end up with two possibilities, one which blocks out the single instance (somefile_16.txt), and one which blocks out all families (somefile_16*.txt). I personally think you prefer the first one:

somefile_((16[^?%*:|"<>]|1[^6?%*:|"<>]|[^1?%*:|"<>])[^?%*:|"<>]*|1)\.txt
somefile_((1[^6?%*:|"<>]|[^1?%*:|"<>])[^?%*:|"<>]*|1)\.txt

In the version without removing special characters so it's easier to read:

somefile_((16.|1[^6]|[^1).*|1)\.txt
somefile_((1[^6]|[^1]).*|1)\.txt

Upvotes: 6

Pierre
Pierre

Reputation: 2866

Without using lookahead

somefile_(|.|[^1].+|10|11|12|13|14|15|17|18|19|.{3,}).txt

Read it like: somefile_ followed by either:

  1. nothing.
  2. one character.
  3. any one character except 1 and followed by any other characters.
  4. three or more characters.
  5. either 10 .. 19 note that 16 has been left out.

and finally followed by .txt.

Upvotes: 1

Bryan Oakley
Bryan Oakley

Reputation: 386352

Sometimes it's just easier to use two regular expressions. First look for everything you want, then ignore everything you don't. I do this all the time on the command line where I pipe a regex that gets a superset into another regex that ignores stuff I don't want.

If the goal is to get the job done rather than find the perfect regex, consider that approach. It's often much easier to write and understand than a regex that makes use of exotic features.

Upvotes: 2

Julien Hoarau
Julien Hoarau

Reputation: 50000

somefile_(?!16).*\.txt

(?!16) means: Assert that it is impossible to match the regex "16" starting at that position.

Upvotes: 3

Related Questions