Menno
Menno

Reputation: 12621

Regex for string not ending with given suffix

I have not been able to find a proper regex to match any string not ending with some condition. For example, I don't want to match anything ending with an a.

This matches

b
ab
1

This doesn't match

a
ba

I know the regex should be ending with $ to mark the end, though I don't know what should preceed it.

Edit: The original question doesn't seem to be a legit example for my case. So: how to handle more than one character? Say anything not ending with ab?

I've been able to fix this, using this thread:

.*(?:(?!ab).).$

Though the downside with this is, it doesn't match a string of one character.

Upvotes: 318

Views: 362331

Answers (10)

Argimko
Argimko

Reputation: 624

If you only need to test string, and don't need to capture text, the fastest way is:

$(?<!a) // 39 steps

This regex: $ puts cursor to the end of the string and then (?<!a) look (negative) behind.

Check it here https://regex101.com/r/VZ5Aqa/1

It is especially useful for JavaScript which doesn't support possessive quantifiers and atomic groups.

Compared with stema answer:

.*(?<!a)$ // 66 steps

But we need to do performance tests to make sure what is actually faster. So let's do this …

Performance test

const regexArgimko1 =    /$(?<!\.\w{2,4})/;   //   270 ms, no text capture
const regexArgimko2 =     /(?<!\.\w{2,4})$/;  //   270 ms, no text capture, slower at Python
const regexArgimko3 = /^.*$(?<!\.\w{2,4})/;   //  1000 ms

const regexStema1   =   /.*(?<!\.\w{2,4})$/;  // 80000 ms
const regexStema2   =  /^.*(?<!\.\w{2,4})$/;  //  3000 ms

const arr = [ /* 100 elements with some filenames */ ];

var a = new Date();
for (let i = 0; i < 50000; i++)
   arr.forEach(el => regexArgimko1.test(el));
var b = new Date();
console.log(b-a);

Test environment: Windows 10, Chromium-based browser v128, DevTools

Conclusion

As you see $(?<!\.\w{2,4}) 300 time faster then .*(?<!\.\w{2,4})$

You can read more about why some regexes is too slow here: Catastrophic backtracking

Upvotes: 0

abalter
abalter

Reputation: 10383

If you are using grep or sed the syntax will be a little different. Notice that the sequential [^a][^b] method does not work here:

balter@spectre3:~$ printf 'jd8a\n8$fb\nq(c\n'
jd8a
8$fb
q(c
balter@spectre3:~$ printf 'jd8a\n8$fb\nq(c\n' | grep ".*[^a]$"
8$fb
q(c
balter@spectre3:~$ printf 'jd8a\n8$fb\nq(c\n' | grep ".*[^b]$"
jd8a
q(c
balter@spectre3:~$ printf 'jd8a\n8$fb\nq(c\n' | grep ".*[^c]$"
jd8a
8$fb
balter@spectre3:~$ printf 'jd8a\n8$fb\nq(c\n' | grep ".*[^a][^b]$"
jd8a
q(c
balter@spectre3:~$ printf 'jd8a\n8$fb\nq(c\n' | grep ".*[^a][^c]$"
jd8a
8$fb
balter@spectre3:~$ printf 'jd8a\n8$fb\nq(c\n' | grep ".*[^a^b]$"
q(c
balter@spectre3:~$ printf 'jd8a\n8$fb\nq(c\n' | grep ".*[^a^c]$"
8$fb
balter@spectre3:~$ printf 'jd8a\n8$fb\nq(c\n' | grep ".*[^b^c]$"
jd8a
balter@spectre3:~$ printf 'jd8a\n8$fb\nq(c\n' | grep ".*[^b^c^a]$"

FWIW, I'm finding the same results in Regex101, which I think is JavaScript syntax.

Bad: https://regex101.com/r/MJGAmX/2
Good: https://regex101.com/r/LzrIBu/2

Upvotes: 0

MatthewRock
MatthewRock

Reputation: 1091

The accepted answer is fine if you can use lookarounds. However, there is also another approach to solve this problem.

If we look at the widely proposed regex for this question:

.*[^a]$

We will find that it almost works. It does not accept an empty string, which might be a little inconvinient. However, this is a minor issue when dealing with just a one character. However, if we want to exclude whole string, e.g. "abc", then:

.*[^a][^b][^c]$

won't do. It won't accept ac, for example.

There is an easy solution for this problem though. We can simply say:

.{,2}$|.*[^a][^b][^c]$

or more generalized version:

.{,n-1}$|.*[^firstchar][^secondchar]$ where n is length of the string you want forbid (for abc it's 3), and firstchar, secondchar, ... are first, second ... nth characters of your string (for abc it would be a, then b, then c).

This comes from a simple observation that a string that is shorter than the text we won't forbid can not contain this text by definition. So we can either accept anything that is shorter("ab" isn't "abc"), or anything long enough for us to accept but without the ending.

Here's an example of find that will delete all files that are not .jpg:

find . -regex '.{,3}$|.*[^.][^j][^p][^g]$' -delete

Upvotes: 5

thomas
thomas

Reputation: 200

The question is old but I could not find a better solution I post mine here. Find all USB drives but not listing the partitions, thus removing the "part[0-9]" from the results. I ended up doing two grep, the last negates the result:

ls -1 /dev/disk/by-path/* | grep -P "\-usb\-" | grep -vE "part[0-9]*$"

This results on my system:

pci-0000:00:0b.0-usb-0:1:1.0-scsi-0:0:0:0

If I only want the partitions I could do:

ls -1 /dev/disk/by-path/* | grep -P "\-usb\-" | grep -E "part[0-9]*$"

Where I get:

pci-0000:00:0b.0-usb-0:1:1.0-scsi-0:0:0:0-part1
pci-0000:00:0b.0-usb-0:1:1.0-scsi-0:0:0:0-part2

And when I do:

readlink -f /dev/disk/by-path/pci-0000:00:0b.0-usb-0:1:1.0-scsi-0:0:0:0

I get:

/dev/sdb

Upvotes: 1

Philipp
Philipp

Reputation: 4729

To search for files not ending with ".tmp" we use the following regex:

^(?!.*[.]tmp$).*$

Tested with the Regex Tester gives following result:

enter image description here

Upvotes: 82

stema
stema

Reputation: 92976

You don't give us the language, but if your regex flavour support look behind assertion, this is what you need:

.*(?<!a)$

(?<!a) is a negated lookbehind assertion that ensures, that before the end of the string (or row with m modifier), there is not the character "a".

See it here on Regexr

You can also easily extend this with other characters, since this checking for the string and isn't a character class.

.*(?<!ab)$

This would match anything that does not end with "ab", see it on Regexr

Upvotes: 431

tckmn
tckmn

Reputation: 59273

Use the not (^) symbol:

.*[^a]$

If you put the ^ symbol at the beginning of brackets, it means "everything except the things in the brackets." $ is simply an anchor to the end.

For multiple characters, just put them all in their own character set:

.*[^a][^b]$

Upvotes: 119

JesperE
JesperE

Reputation: 64404

Try this

/.*[^a]$/

The [] denotes a character class, and the ^ inverts the character class to match everything but an a.

Upvotes: 6

Bill
Bill

Reputation: 5764

Anything that matches something ending with a --- .*a$ So when you match the regex, negate the condition or alternatively you can also do .*[^a]$ where [^a] means anything which is not a

Upvotes: 0

Kent
Kent

Reputation: 195029

.*[^a]$

the regex above will match strings which is not ending with a.

Upvotes: 10

Related Questions