pps
pps

Reputation: 113

Java 8 remove duplicate strings irrespective of case from a list

How can we remove duplicate elements from a list of String without considering the case for each word, for example consider below code snippet

    String str = "Kobe Is is The the best player In in Basketball basketball game .";
    List<String> list = Arrays.asList(str.split("\\s"));
    list.stream().distinct().forEach(s -> System.out.print(s+" "));

This still gives the same output as below, which is obvious

Kobe Is is The the best player In in Basketball basketball game .

I need the result as follows

Kobe Is The best player In Basketball game .

Upvotes: 10

Views: 8511

Answers (8)

Yaniv Levi
Yaniv Levi

Reputation: 476

The provided solution with TreeSet is elegant. but TreeSet also sorts the elements which makes the solution inefficient. The code below demonstrates how to implement it more efficiently using HashMap that gives precedence to the string that has more upper case letters

class SetWithIgnoreCase {
    private HashMap<String, String> underlyingMap = new HashMap<>();

    public void put(String str) {
        String lowerCaseStr = str.toLowerCase();
        underlyingMap.compute(lowerCaseStr, (k, v) -> (v == null) ? str : (compare(v, str) > 0 ? v : str));
    }

    private int compare(String str1, String str2) {
        int upperCaseCnt1 = 0;
        int upperCaseCnt2 = 0;
        for (int i = 0; i < str1.length(); i++) {
            upperCaseCnt1 += (Character.isUpperCase(str1.charAt(i)) ? 1 : 0);
            upperCaseCnt2 += (Character.isUpperCase(str2.charAt(i)) ? 1 : 0);
        }
        return upperCaseCnt1 - upperCaseCnt2;
    }
}

Upvotes: 0

Tomasz Linkowski
Tomasz Linkowski

Reputation: 4496

Here's a one-line solution that:

This solution makes use of the jOOλ library and its Seq.distinct(Function<T,U>) method:

List<String> distinctWords = Seq.seq(list).distinct(String::toLowerCase).toList();

Result (when printed like in the question):

Kobe Is The best player In Basketball game .

Upvotes: 0

Holger
Holger

Reputation: 298429

Taking your question literally, to “remove duplicate strings irrespective of case from a list”, you may use

// just for constructing a sample list
String str = "Kobe Is is The the best player In in Basketball basketball game .";
List<String> list = new ArrayList<>(Arrays.asList(str.split("\\s")));

// the actual operation
TreeSet<String> seen = new TreeSet<>(String.CASE_INSENSITIVE_ORDER);
list.removeIf(s -> !seen.add(s));

// just for debugging
System.out.println(String.join(" ", list));

Upvotes: 13

NirajT
NirajT

Reputation: 47

The problem with the repeating string is that those don't occur in exact same case first word is Basketball and other one is basketball so both those are not the same ones. Capital B is there in first occurance. So what you can do is you can do the comparison of string into either lower case or UPPER CASE or you can do comparison ignoring case.

Upvotes: 0

Ousmane D.
Ousmane D.

Reputation: 56453

Here's a fun solution to get the expected result with the use of streams.

String result = Pattern.compile("\\s")
                .splitAsStream(str)
                .collect(Collectors.collectingAndThen(Collectors.toMap(String::toLowerCase,
                        Function.identity(),
                        (l, r) -> l,
                        LinkedHashMap::new),
                        m -> String.join(" ", m.values())));

prints:

Kobe Is The best player In Basketball game .

Upvotes: 3

Robby Cornelissen
Robby Cornelissen

Reputation: 97272

In case you only need to get rid of consecutive duplicates, you can use a regular expression. The regex below checks for duplicated words, ignoring case.

String input = "Kobe Is is The the best player In in Basketball basketball game .";
String output = input.replaceAll("(?i)\\b(\\w+)\\s+\\1\\b", "$1");

System.out.println(output);

Which outputs:

Kobe Is The best player In Basketball game .

Upvotes: 3

maio290
maio290

Reputation: 6742

Keeping your uppercase and removing lowercase:

String str = "Kobe Is is The the best player In in Basketball basketball game .";
List<String> list = Arrays.asList(str.split("\\s"));
for(int i = 1; i<list.size(); i++)
{
        if(list.get(i).equalsIgnoreCase(list.get(i-1)))
        {
            // is lower case
            if(list.get(i).toLowerCase().equals(list.get(i)))
            {
                list.set(i,"");
            }
            else
            {
                list.set(i-1, "");
            }
        }
}

list.stream().distinct().forEach(s -> System.out.print(s+" "));             

Upvotes: 0

Leviand
Leviand

Reputation: 2805

if it's not a problem for you losing while print all the capital letters, you can do in this way

    list.stream()
            .map(String::toLowerCase)
            .distinct()
            .forEach(System.out::print)

Output:

kobe is the best player in basketball game .

Upvotes: 1

Related Questions