Brian
Brian

Reputation: 1896

indexOf Case Sensitive?

Is the indexOf(String) method case sensitive? If so, is there a case insensitive version of it?

Upvotes: 93

Views: 142228

Answers (19)

Robin Davies
Robin Davies

Reputation: 7837

I would like to lay claim to the ONE and only solution posted so far that actually works. :-)

Three classes of problems that have to be dealt with.

  1. Non-transitive matching rules for lower and uppercase. The Turkish I problem has been mentioned frequently in other replies. According to comments in Android source for String.regionMatches, the Georgian comparison rules requires additional conversion to lower-case when comparing for case-insensitive equality.

  2. Cases where upper- and lower-case forms have a different number of letters. Pretty much all of the solutions posted so far fail, in these cases. Example: German STRASSE vs. Straße have case-insensitive equality, but have different lengths.

  3. Binding strengths of accented characters. Locale AND context effect whether accents match or not. In French, the uppercase form of 'é' is 'E', although there is a movement toward using uppercase accents . In Canadian French, the upper-case form of 'é' is 'É', without exception. Users in both countries would expect "e" to match "é" when searching. Whether accented and unaccented characters match is locale-specific. Now consider: does "E" equal "É"? Yes. It does. In French locales, anyway.

I am currently using android.icu.text.StringSearch to correctly implement previous implementations of case-insensitive indexOf operations.

Non-Android users can access the same functionality through the ICU4J package, using the com.ibm.icu.text.StringSearch class.

Be careful to reference classes in the correct icu package (android.icu.text or com.ibm.icu.text) as Android and the JRE both have classes with the same name in other namespaces (e.g. Collator).

    this.collator = (RuleBasedCollator)Collator.getInstance(locale);
    this.collator.setStrength(Collator.PRIMARY);

    ....

    StringSearch search = new StringSearch(
         pattern,
         new StringCharacterIterator(targetText),
         collator);
    int index = search.first();
    if (index != SearchString.DONE)
    {
        // remember that the match length may NOT equal the pattern length.
        length = search.getMatchLength();
        .... 
    }

Test Cases (Locale, pattern, target text, expectedResult):

    testMatch(Locale.US,"AbCde","aBcDe",true);
    testMatch(Locale.US,"éèê","EEE",true);

    testMatch(Locale.GERMAN,"STRASSE","Straße",true);
    testMatch(Locale.FRENCH,"éèê","EEE",true);
    testMatch(Locale.FRENCH,"EEE","éèê",true);
    testMatch(Locale.FRENCH,"éèê","ÉÈÊ",true);

    testMatch(new Locale("tr-TR"),"TITLE","tıtle",true);  // Turkish dotless I/i
    testMatch(new Locale("tr-TR"),"TİTLE","title",true);  // Turkish dotted I/i
    testMatch(new Locale("tr-TR"),"TITLE","title",false);  // Dotless-I != dotted i.

PS: As best as I can determine, the PRIMARY binding strength should do the right thing when locale-specific rules differentiate between accented and non-accented characters according to dictionary rules; but I don't which locale to use to test this premise. Donated test cases would be gratefully appreciated.

--

Copyright notice: because StackOverflow's CC-BY_SA copyrights as applied to code-fragments are unworkable for professional developers, these fragments are dual licensed under more appropriate licenses here: https://pastebin.com/1YhFWmnU

Upvotes: 2

Ernie Thomason
Ernie Thomason

Reputation: 1709

Here's a version closely resembling Apache's StringUtils version:

public int indexOfIgnoreCase(String str, String searchStr) {
    return indexOfIgnoreCase(str, searchStr, 0);
}

public int indexOfIgnoreCase(String str, String searchStr, int fromIndex) {
    // https://stackoverflow.com/questions/14018478/string-contains-ignore-case/14018511
    if(str == null || searchStr == null) return -1;
    if (searchStr.length() == 0) return fromIndex;  // empty string found; use same behavior as Apache StringUtils
    final int endLimit = str.length() - searchStr.length() + 1;
    for (int i = fromIndex; i < endLimit; i++) {
        if (str.regionMatches(true, i, searchStr, 0, searchStr.length())) return i;
    }
    return -1;
}

Upvotes: 0

Jawwad Rafiq
Jawwad Rafiq

Reputation: 1

 static string Search(string factMessage, string b)
        {

            int index = factMessage.IndexOf(b, StringComparison.CurrentCultureIgnoreCase);
            string line = null;
            int i = index;
            if (i == -1)
            { return "not matched"; }
            else
            {
                while (factMessage[i] != ' ')
                {
                    line = line + factMessage[i];
                    i++;
                }

                return line;
            }

        }

Upvotes: 0

max
max

Reputation: 10464

Just to sum it up, 3 solutions:

  • using toLowerCase() or toUpperCase
  • using StringUtils of apache
  • using regex

Now, what I was wondering was which one is the fastest? I'm guessing on average the first one.

Upvotes: 1

Zach Vorhies
Zach Vorhies

Reputation: 199

Here is my solution which does not allocate any heap memory, therefore it should be significantly faster than most of the other implementations mentioned here.

public static int indexOfIgnoreCase(final String haystack,
                                    final String needle) {
    if (needle.isEmpty() || haystack.isEmpty()) {
        // Fallback to legacy behavior.
        return haystack.indexOf(needle);
    }

    for (int i = 0; i < haystack.length(); ++i) {
        // Early out, if possible.
        if (i + needle.length() > haystack.length()) {
            return -1;
        }

        // Attempt to match substring starting at position i of haystack.
        int j = 0;
        int ii = i;
        while (ii < haystack.length() && j < needle.length()) {
            char c = Character.toLowerCase(haystack.charAt(ii));
            char c2 = Character.toLowerCase(needle.charAt(j));
            if (c != c2) {
                break;
            }
            j++;
            ii++;
        }
        // Walked all the way to the end of the needle, return the start
        // position that this was found.
        if (j == needle.length()) {
            return i;
        }
    }

    return -1;
}

And here are the unit tests that verify correct behavior.

@Test
public void testIndexOfIgnoreCase() {
    assertThat(StringUtils.indexOfIgnoreCase("A", "A"), is(0));
    assertThat(StringUtils.indexOfIgnoreCase("a", "A"), is(0));
    assertThat(StringUtils.indexOfIgnoreCase("A", "a"), is(0));
    assertThat(StringUtils.indexOfIgnoreCase("a", "a"), is(0));

    assertThat(StringUtils.indexOfIgnoreCase("a", "ba"), is(-1));
    assertThat(StringUtils.indexOfIgnoreCase("ba", "a"), is(1));

    assertThat(StringUtils.indexOfIgnoreCase("Royal Blue", " Royal Blue"), is(-1));
    assertThat(StringUtils.indexOfIgnoreCase(" Royal Blue", "Royal Blue"), is(1));
    assertThat(StringUtils.indexOfIgnoreCase("Royal Blue", "royal"), is(0));
    assertThat(StringUtils.indexOfIgnoreCase("Royal Blue", "oyal"), is(1));
    assertThat(StringUtils.indexOfIgnoreCase("Royal Blue", "al"), is(3));
    assertThat(StringUtils.indexOfIgnoreCase("", "royal"), is(-1));
    assertThat(StringUtils.indexOfIgnoreCase("Royal Blue", ""), is(0));
    assertThat(StringUtils.indexOfIgnoreCase("Royal Blue", "BLUE"), is(6));
    assertThat(StringUtils.indexOfIgnoreCase("Royal Blue", "BIGLONGSTRING"), is(-1));
    assertThat(StringUtils.indexOfIgnoreCase("Royal Blue", "Royal Blue LONGSTRING"), is(-1));  
}

Upvotes: 17

Bernd S
Bernd S

Reputation: 1308

The first question has already been answered many times. Yes, the String.indexOf() methods are all case-sensitive.

If you need a locale-sensitive indexOf() you could use the Collator. Depending on the strength value you set you can get case insensitive comparison, and also treat accented letters as the same as the non-accented ones, etc. Here is an example of how to do this:

private int indexOf(String original, String search) {
    Collator collator = Collator.getInstance();
    collator.setStrength(Collator.PRIMARY);
    for (int i = 0; i <= original.length() - search.length(); i++) {
        if (collator.equals(search, original.substring(i, i + search.length()))) {
            return i;
        }
    }
    return -1;
}

Upvotes: 2

phil
phil

Reputation: 37

Had the same problem. I tried regular expression and the apache StringUtils.indexOfIgnoreCase-Method, but both were pretty slow... So I wrote an short method myself...:

public static int indexOfIgnoreCase(final String chkstr, final String searchStr, int i) {
    if (chkstr != null && searchStr != null && i > -1) {
          int serchStrLength = searchStr.length();
          char[] searchCharLc = new char[serchStrLength];
          char[] searchCharUc = new char[serchStrLength];
          searchStr.toUpperCase().getChars(0, serchStrLength, searchCharUc, 0);
          searchStr.toLowerCase().getChars(0, serchStrLength, searchCharLc, 0);
          int j = 0;
          for (int checkStrLength = chkstr.length(); i < checkStrLength; i++) {
                char charAt = chkstr.charAt(i);
                if (charAt == searchCharLc[j] || charAt == searchCharUc[j]) {
                     if (++j == serchStrLength) {
                           return i - j + 1;
                     }
                } else { // faster than: else if (j != 0) {
                         i = i - j;
                         j = 0;
                    }
              }
        }
        return -1;
  }

According to my tests its much faster... (at least if your searchString is rather short). if you have any suggestions for improvement or bugs it would be nice to let me know... (since I use this code in an application ;-)

Upvotes: 2

Jakub Vr&#225;na
Jakub Vr&#225;na

Reputation: 684

Converting both strings to lower-case is usually not a big deal but it would be slow if some of the strings is long. And if you do this in a loop then it would be really bad. For this reason, I would recommend indexOfIgnoreCase.

Upvotes: 0

jjnguy
jjnguy

Reputation: 138922

Yes, indexOf is case sensitive.

The best way to do case insensivity I have found is:

String original;
int idx = original.toLowerCase().indexOf(someStr.toLowerCase());

That will do a case insensitive indexOf().

Upvotes: 17

deepika
deepika

Reputation: 421

There is an ignore case method in StringUtils class of Apache Commons Lang library

indexOfIgnoreCase(CharSequence str, CharSequence searchStr)

Upvotes: 23

Joey
Joey

Reputation: 354694

The indexOf() methods are all case-sensitive. You can make them (roughly, in a broken way, but working for plenty of cases) case-insensitive by converting your strings to upper/lower case beforehand:

s1 = s1.toLowerCase(Locale.US);
s2 = s2.toLowerCase(Locale.US);
s1.indexOf(s2);

Upvotes: 80

Nick Lewis
Nick Lewis

Reputation: 4230

Yes, it is case-sensitive. You can do a case-insensitive indexOf by converting your String and the String parameter both to upper-case before searching.

String str = "Hello world";
String search = "hello";
str.toUpperCase().indexOf(search.toUpperCase());

Note that toUpperCase may not work in some circumstances. For instance this:

String str = "Feldbergstraße 23, Mainz";
String find = "mainz";
int idxU = str.toUpperCase().indexOf (find.toUpperCase ());
int idxL = str.toLowerCase().indexOf (find.toLowerCase ());

idxU will be 20, which is wrong! idxL will be 19, which is correct. What's causing the problem is tha toUpperCase() converts the "ß" character into TWO characters, "SS" and this throws the index off.

Consequently, always stick with toLowerCase()

Upvotes: 11

toolkit
toolkit

Reputation: 50257

What are you doing with the index value once returned?

If you are using it to manipulate your string, then could you not use a regular expression instead?

import static org.junit.Assert.assertEquals;    
import org.junit.Test;

public class StringIndexOfRegexpTest {

    @Test
    public void testNastyIndexOfBasedReplace() {
        final String source = "Hello World";
        final int index = source.toLowerCase().indexOf("hello".toLowerCase());
        final String target = "Hi".concat(source.substring(index
                + "hello".length(), source.length()));
        assertEquals("Hi World", target);
    }

    @Test
    public void testSimpleRegexpBasedReplace() {
        final String source = "Hello World";
        final String target = source.replaceFirst("(?i)hello", "Hi");
        assertEquals("Hi World", target);
    }
}

Upvotes: 4

Carl Manaster
Carl Manaster

Reputation: 40356

But it's not hard to write one:

public class CaseInsensitiveIndexOfTest extends TestCase {
    public void testOne() throws Exception {
        assertEquals(2, caseInsensitiveIndexOf("ABC", "xxabcdef"));
    }

    public static int caseInsensitiveIndexOf(String substring, String string) {
        return string.toLowerCase().indexOf(substring.toLowerCase());
    }
}

Upvotes: 0

Yacoby
Yacoby

Reputation: 55465

Yes, I am fairly sure it is. One method of working around that using the standard library would be:

int index = str.toUpperCase().indexOf("FOO"); 

Upvotes: 2

Paul McKenzie
Paul McKenzie

Reputation: 20094

@Test
public void testIndexofCaseSensitive() {
    TestCase.assertEquals(-1, "abcDef".indexOf("d") );
}

Upvotes: 1

dfa
dfa

Reputation: 116372

Is the indexOf(String) method case sensitive?

Yes, it is case sensitive:

@Test
public void indexOfIsCaseSensitive() {
    assertTrue("Hello World!".indexOf("Hello") != -1);
    assertTrue("Hello World!".indexOf("hello") == -1);
}

If so, is there a case insensitive version of it?

No, there isn't. You can convert both strings to lower case before calling indexOf:

@Test
public void caseInsensitiveIndexOf() {
    assertTrue("Hello World!".toLowerCase().indexOf("Hello".toLowerCase()) != -1);
    assertTrue("Hello World!".toLowerCase().indexOf("hello".toLowerCase()) != -1);
}

Upvotes: 44

Robbie
Robbie

Reputation: 841

indexOf is case sensitive. This is because it uses the equals method to compare the elements in the list. The same thing goes for contains and remove.

Upvotes: -2

John Topley
John Topley

Reputation: 115372

I've just looked at the source. It compares chars so it is case sensitive.

Upvotes: 2

Related Questions