Steve Gietz
Steve Gietz

Reputation: 71

Regex to exclude certain file extensions

similar questions have been asked, but they miss one thing I need to do and I can't figure it out.

I need to find all files that do NOT have either a tif, or tiff extension, but I DO need to find all others including those that have no extension. I got the first part working with the regex below, but this doesn't match files with no extension.

^(.+)\.(?!tif$|tiff$).+$

That works great, but I need the following to work.

filename.ext MATCH
filename.abc MATCH
filename.tif FAIL
filename     MATCH

Thanks :)

Upvotes: 6

Views: 9213

Answers (5)

Mike
Mike

Reputation: 638

Here's what I came up with:

^[^\.\s]+(\.|\s)(?!tiff?)

Explanation:

Beginning of line to dot or whitespace, put your matching group around this, ie:

^(?<result>[^\.\s]+)

It will then look for a dot or a whitespace, with a negative lookahead on the tiff (tiff? will match to both tif and tiff).

This makes the assumption that there will always be a dot or a whitespace after the filename. You can change this to be an end of line if that is what you need:

^[^\.\s]+(\.(?!tiff?)|\n)   linux
^[^\.\s]+(\.(?!tiff?)|\r\n) windows

Upvotes: 0

Shakiba Moshiri
Shakiba Moshiri

Reputation: 23794

If you have some strings in a text file ( that has newline ):

perl -lne '/(?:tiff?)/ || print' file  

If you have some files in a directory:

ls | perl -lne '/(?:tiff?)/ || print'  

Screen-shot:

enter image description here

Upvotes: 0

Cyril Lemaire
Cyril Lemaire

Reputation: 192

This works for me:

^(?:(.+\.)((?!tif$|tiff$)[^.]*)|[^.]+)$

That regex is split in two different parts:

Part 1: (.+)\.((?!tif$|tiff$)[^.]*)

  • (.+) (1st capturing group) Match a filename (potentially containing dots)
  • \. Match the last dot of the string (preceding the extension).
  • ((?!tif$|tiff$)[^.]*) (2nd capturing group) Then check if the dot is not followed by exactly "tif" or "tiff" and if so match the extension.

Part 2: [^.]+ If part 1 didn't match, check if you have just a filename containing no dot.

Upvotes: 1

Thomas Ayoub
Thomas Ayoub

Reputation: 29431

If you're not working with JS/ECMAscript regex, you can use:

^.*(?<!\.tif)(?<!\.tiff)$

Upvotes: 4

user1919238
user1919238

Reputation:

Rather than writing a negative regex, consider using the simpler, positive regex, but taking action when something does not match. This is often a superior approach.

It can't be used in every situation (e.g. if you are using a command line tool that requires you to specify what does match), but I would do this where possible.

Upvotes: 1

Related Questions