Chaklader
Chaklader

Reputation: 169

How to correct the regex to find exact word match without being case sensitive?

I have a private method that I'm testing and provided below,

private boolean containsExactDrugName(String testString, String drugName) {

    Matcher m = Pattern.compile("\\b(?:" + drugName + ")\\b|\\S+", Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE).matcher(testString);
    ArrayList<String> results = new ArrayList<>();

    while (m.find()) {
        results.add(m.group());
    }

    boolean found = results.contains(drugName);
    return found;
}

I take a text String and medication name provided inside the method and returns boolean. I need it to be case insensitive and the last assertion of the test is failing. The test is provided below,

@Test
public void test_getRiskFactors_givenTextWith_Orlistat_Should_Not_Find_Medication() throws Exception {

    String drugName = "Orlistat";
    assertEquals("With Orlistat", true, containsExactDrugName("The patient is currently being treated with Orlistat", drugName));
    assertEquals("With Orlistattesee", false, containsExactDrugName("The patient is currently being treated with Orlistattesee", drugName));
    assertEquals("With abcOrlistat", false, containsExactDrugName("The patient is currently being treated with abcOrlistat", drugName));
    assertEquals("With orlistat", true, containsExactDrugName("The patient is currently being treated with orlistat", drugName));
}

In the last assertion the drug name is in lower case orlistat but still needs to match with the provided parameter Orlistat. I used Pattern.CASE_INSENSITIVE, however its not working. How to write the code properly ?

Upvotes: 1

Views: 2652

Answers (2)

nbrooks
nbrooks

Reputation: 18233

The problem isn't mainly in your regular expression, it's the containsExactDrugName method itself. You're doing case-insensitive matching to find the drugName within the larger string, but then you look for an exact match of the drugName within the resulting list of matched strings:

results.contains(drugName)

This check is not only redundant (since the regex already did the work of finding the matches), it's actively breaking your function, because once again you're checking for an exact, case-sensitive match. Simply get rid of that:

private boolean containsExactDrugName(String testString, String drugName) {

    Matcher m = Pattern.compile("\\b(?:" + drugName + ")\\b", Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE).matcher(testString);
    List<String> results = new ArrayList<>();

    while (m.find()) {
        results.add(m.group());
    }

    return !results.isEmpty();
}

Actually, since you're not keeping track of the number of times you've found drugName, the entire list is pointless, and you can simplify your method to:

private boolean containsExactDrugName(String testString, String drugName) {

    Matcher m = Pattern.compile("\\b(?:" + drugName + ")\\b", Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE).matcher(testString);

    return m.find();
}

Edit - Your regex is also too permissive. It's matching on \\S+, which means any sequence of 1 or more non-space characters. I'm not sure why you included that, but it's causing your regex to match things that are not the drugName. Remove the |\\S+ section of the expression.

Upvotes: 2

mhasan
mhasan

Reputation: 3709

You need (?i) before the of the pattern that you want to make case insensitive

Change your regex from

\\b(?:" + drugName + ")\\b|\\S+

to this

(?i)\\b(" + drugName + ")\\b|\\S+

Upvotes: 1

Related Questions