INS
INS

Reputation: 10820

Combine case sensitive regex and case insensitive regex into one

I have multiple filters for files (I'm using python). Some of them are glob filters some of them are regular expressions. I have both case sensitive and case insensitive globs and regexes. I can transform the glob into a regular expression with translate.

I can combine the case sensitive regular expressions into one big regular expression. Let's call it R_sensitive.

I can combine the case insensitive regular expressions into one big regular expression (case insensitive). Let's call it R_insensitive.

Is there a way to combine R_insensitive and R_sensitive into one regular expression? The expression would be (of course) case sensitive?

Thanks,

Iulian

NOTE: The way I combine expressions is the following:

Having R1,R2,R3 regexes I make R = (R1)|(R2)|(R3).

EXAMPLE:

I'm searching for "*.txt" (insensitive glob). But I have another glob that is like this: "*abc*" (case sensitive). How to combine (from programming) the 2 regex resulted from "fnmatch.translate" when one is case insensitive while the other is case sensitive?

Upvotes: 2

Views: 886

Answers (2)

Pi Marillion
Pi Marillion

Reputation: 4674

Unfortunately, the regex ability you describe is either ordinal modifiers or a modifier span. Python does not support either, though here are what they would look like:

Ordinal Modifiers: (?i)case_insensitive_match(?-i)case_sensitive_match

Modifier Spans: (?i:case_insensitive_match)(?-i:case_sensitive_match)

In Python, they both fail to parse in re. The closest thing you could do (for simple or small matches) would be letter groups:

[Cc][Aa][Ss][Ee]_[Ii][Nn][Ss][Ee][Nn][Ss][Ii][Tt][Ii][Vv][Ee]_[Mm][Aa][Tt][Cc][Hh]case_sensitive_match

Obviously, this approach would be best for something where the insensitive portion is very brief, so I'm afraid it wouldn't be the best choice for you.

Upvotes: 2

abarnert
abarnert

Reputation: 365787

What you need is a way to convert a case-insensitive-flagged regexp into a regexp that works equivalent without the flag.

To do this fully generally is going to be a nightmare.

To do this just for fnmatch results is a whole lot easier.

If you need to handle full Unicode case rules, it will still be very hard.

If you only need to handle making sure each character c also matches c.upper() and c.lower(), it's very easy.

I'm only going to explain the easy case, because it's probably what you want, given your examples, and it's easy. :)

Some modules in the Python standard library are meant to serve as sample code as well as working implementations; these modules' docs start with a link directly to their source code. And fnmatch has such a link.

If you understand regexp syntax, and glob syntax, and look at the source to the translate function, it should be pretty easy to write your own translatenocase function.

Basically: In the inner else clause for building character classes, iterate over the characters, and for each character, if c.upper() != c.lower(), append both instead of c. Then, in the outer else clause for non-special characters, if c.upper() != c.lower(), append a two-character character class consisting of those two characters.

So, translatenocase('*.txt') will return something like r'.*\.[tT][xX][tT]' instead of something like r'.*\.txt'. But normal translate('*abc*') will of course return the usual r'.*abc.*'. And you can combine these just by using an alternation, as you apparently already know how to do.

Upvotes: 1

Related Questions