Grundlefleck
Grundlefleck

Reputation: 129287

Regex to match from partial or camel case string?

I would like a regular expression to match given a partial or camel cased string. For example, if the search set contains the string "MyPossibleResultString" I want to be able to match it with the likes of the following:

I'd also like to include wildcard matching, e.g.:

If it's not clear what I mean, the only example I can think of is Eclipse's "Open Type" dialog which is pretty much the exact behaviour I'm looking for. I'm not too up on the use of regexes, so I'm not sure if it matters if I'm looking for a solution in Java.

Upvotes: 3

Views: 3452

Answers (5)

ewino
ewino

Reputation: 185

Il-Bhima's answear is great, but I find that this code works better for me (pardon my C#, but it's the same):

pattern = Regex.Escape(pattern);
pattern = pattern.Replace(@"\*", ".*?");
pattern = Regex.Replace(pattern, "([A-Z][^A-Z]*)", "$1[^A-Z]*?") + ".*";

Notice the ".* " at the end that allows incomplete "startof" phrases (also allows not specifying all of the capital letters) Also, the asterisk after the "[^A-Z]*" matcher fixes problems like q4 in toolkit's answer where small letters were provided after the capital ones (they should appear directly after the capital letter, not before the next one).

Upvotes: 0

toolkit
toolkit

Reputation: 50237

You could try something like:

class RegexTransformer {
    public String fromQuery(String query) {
        StringBuilder sb = new StringBuilder();
        sb.append("^");
        sb.append(query.replaceAll("(\\p{Upper}(?!\\p{Lower}))",
                "$1\\\\p{Alpha}*?"));
        sb.append("$");
        return sb.toString();
    }
}

See Pattern API for description of negative lookahead assertions (?!pat), POSIX character classes \p{class}, and reluctant quantifiers *?.

Example test case:

import static org.junit.Assert.assertEquals;
import static org.junit.Assert.assertTrue;

import org.junit.Test;

public class RegexTransformerTest {

    private RegexTransformer rt = new RegexTransformer();

    @Test
    public void testQueries() {
        String in = "MyPossibleResultString";
        String q1 = "MyPossibleResultString";
        String q2 = "MPRS";
        String q3 = "MPRString";
        String q4 = "MyPosResStr"; // this wont work
        String q5 = "M";

        test(in, q1, "^MyPossibleResultString$");
        test(in, q2, "^M\\p{Alpha}*?P\\p{Alpha}*?R\\p{Alpha}*?S\\p{Alpha}*?$");
        test(in, q3, "^M\\p{Alpha}*?P\\p{Alpha}*?R\\p{Alpha}*?String$");
        test(in, q5, "^M\\p{Alpha}*?$");
    }

    private void test(String in, String query, String expected) {
        assertEquals("transform", expected, rt.fromQuery(query));
        assertTrue("match", in.matches(rt.fromQuery(query)));
    }
}

Upvotes: 0

Il-Bhima
Il-Bhima

Reputation: 10880

Ok so I can't really see why you would need the wildcard feature if you can already support the matching described in the first example. This is what I put together. Given a query string query, you use a regular expression to create a regular expression:

String re = "\\b(" + query.replaceAll("([A-Z][^A-Z]*)", "$1[^A-Z]*") + ".*?)\\b";

For example the query MyPosResStr will become the regex:

\\b(My[^A-Z]*Pos[^A-Z]*Res[^A-Z]*Str[^A-Z]*.*?)\\b

You then use this regex for your matching using the Matcher.find method to get something like this:

public static String matchCamelCase(String query, String str) {
    query = query.replaceAll("\\*", ".*?");
    String re = "\\b(" + query.replaceAll("([A-Z][^A-Z]*)", "$1[^A-Z]*") + ".*?)\\b";

    System.out.println(re);
    Pattern regex = Pattern.compile(re);

    Matcher m = regex.matcher(str);

    if  (m.find()) {
        return m.group();
    } else return null;
}

This will return the first match to your camel case query in the string str.

EDIT: I have added a line to handle wildcards since in my tired stupor I didn't appreciate the need for them

Upvotes: 4

John Ellinwood
John Ellinwood

Reputation: 14531

As danbruc said, you have to generate a new regex for each new query. This code should do what you want.

public Pattern queryToPattern(String query) {
    StringBuilder sb = new StringBuilder();
    char[] chars = query.toCharArray();
    boolean incamel = false;
    for (int i=0; i < chars.length; i++) {
        if (chars[i] == '*') {
                            if (!incamel)
                    sb.append(".*");
        } else if (Character.isUpperCase(chars[i])) {
            if (incamel) {
                sb.append(".*");
            }
            sb.append(chars[i]);
            incamel = true;
        } else {
            sb.append(chars[i]);
        }

    }
    sb.append(".*");
    return Pattern.compile(sb.toString());
}

A query of: MyP*RString

Creates a pattern of: My.* P.* R.* String.*

Upvotes: 2

Daniel Br&#252;ckner
Daniel Br&#252;ckner

Reputation: 59655

It is not possible to do this with a single regular expression. You will have to build a regular expression based on the input and use this to search. It is easy to see that you cannot use a single regex - the user can search for any (cammel cased) string and so your regex needs to match any (cammel cased) string but than it is not a search anymore.

Upvotes: 1

Related Questions