Connor S
Connor S

Reputation: 363

How to compare strings while ignoring specific characters

I know there are similar threads on this, but most of them only involve ignoring spaces.

I have to write an app using some poorly written data sheets, so often I have to compare things like this: Packs, packs, Pack(s), pack(s), pack

These should all be considered equal, as they are all a pack. However, none of the people who made these data sheets communicated with each other so now I get to deal with it.

How can I compare strings while ignoring parentheses, spaces, the 's' character, and also making sure everything is lowercase before comparison?

All I have right now is this:

private boolean sCompare(String s1, String s2)
{


   return s1.equalsIgnoreCase(s2)
}

Obviously it isn't much and doesn't do anything other than directly compare two lowercase strings, but I'm not sure the proper approach to get the results I need.

The new comparison function should return true for the examples above, and false when comparing things like: Pack(s) and Case(s), Packs and Case(s), etc.

EDIT Using help from the best answer, I've created a function that suits my needs

private boolean sCompare(String s1, String s2)
{
    String rx = "[\\se(s)|s$]";
    return (s1.toLowerCase().replaceAll(rx,"")).equals(s2.toLowerCase().replaceAll(rx,""));
}

Upvotes: 0

Views: 3312

Answers (3)

Jeremy Then
Jeremy Then

Reputation: 535

You can use:

s1.replaceAll("\\W|s\\)?$", "").equals("pack"); // true

or:

s1.replaceAll("\\W|s", "").equals("pack"); // true

If you don't care about any other s character in the string.

"\W|s\)?$" will remove everything that is not a word character and any s at the end.

If you know there will be no other s in the words but the last one, then you can use this simplified expression: "\W|s". It will remove everything that is not a word character and any s in the string.

Upvotes: 0

Petr M
Petr M

Reputation: 163

Hi I think that this answers your question :) just add another forbidden character to set and it will simply filter that char too.

   Set<Character> forbiddenChars = Set.of('s', '{', '}', ' ');

        String testString = "This Is{ Test} string";

        String filteredString = testString
                                        .toLowerCase()
                                        .codePoints()
                                        .filter(character -> !forbiddenChars.contains((char)character))
                                        .collect(StringBuilder::new, StringBuilder::appendCodePoint,
                                                    StringBuilder::append)
                                        .toString();
        System.out.println(filteredString);

Upvotes: 0

Not a JD
Not a JD

Reputation: 1902

This:

public static void main(String[] args) throws Exception {
    String REGEX = "\\(s\\)|s$";

    System.out.println("Packs".replaceAll(REGEX, "")
                              .toLowerCase());
    System.out.println("packs".replaceAll(REGEX, "")
                              .toLowerCase());
    System.out.println("Pack(s)".replaceAll(REGEX, "")
                                .toLowerCase());
    System.out.println("pack(s)".replaceAll(REGEX, "")
                                .toLowerCase());
    System.out.println("pack".replaceAll(REGEX, "")
                             .toLowerCase());
}

Yields:

pack
pack
pack
pack
pack

So this should do it:

private static boolean sCompare(String s1, String s2) {
    return discombobulate(s1).equals(discombobulate(s2));
}

private static String discombobulate(String s) {
    String REGEX = "\\(s\\)|s$";

    return s.replaceAll(REGEX, "")
            .toLowerCase();
}

Upvotes: 1

Related Questions