humazed
humazed

Reputation: 76922

Multiple distinctBy don't give consistent results

I have a list of strings and I'm trying to keep the only distinct strings and ignore letter case and spaces

ex: listOf("a", "A", "A ", "B") => listOf("a", "B")

so I tried this solution:

val list = listOf("a", "A", "A  ", "B")
        .distinctBy { it.toLowerCase() }
        .distinctBy { it.trim() }

println("list = ${list}")

outputs: list = [a, A , B]

but when I change distinctBy order it surprisingly works:

val list = listOf("a", "A", "A  ", "B")
        .distinctBy { it.trim() }
        .distinctBy { it.toLowerCase() }

println("list = ${list}")

outputs: list = [a, B]

I want the output to be as the second but why the first code doesn't work?

and when I change the order of the items I have the same problem:

   val list = listOf("a", "A  ", "A", "B")
            .distinctBy { it.toLowerCase() }
            .distinctBy { it.trim() }

    println("list = ${list}")

outputs: list = [a, A , B]

Upvotes: 1

Views: 2236

Answers (3)

user8959091
user8959091

Reputation:

val list =  listOf("a", "A ", "A", "A ", "B")
        .map { it -> it.trim() }
        .distinctBy { it.toLowerCase() }

because 1st you trim and then ask distinct values
if you get distinct values first the result is list = [a, A , B]
so trimming after gives: list = [a, A, B]

distinctBy means get all the distinct (different) items

Upvotes: 1

Alex
Alex

Reputation: 483

In first case when you apply .distinctBy { it.toLowerCase() }, it compares ["a", "a", "a ", "b" ] values and resulting array will be ["a", "A ", "B"]

When .distinctBy { it.trim() }called, it will compare "a", "A", "B" strings, resulting in ["a", "A ", "B"]

In second case after applying .distinctBy { it.trim() } you get ["a", "A", "B"] and then .distinctBy { it.toLowerCase() } will compare ["a","a","b"], and you get desired result

Also you can combine both operations in single distinctBy block:

val list = listOf("a", "A", "A ", "B") .distinctBy { it.toLowerCase().trim() }

Upvotes: 1

Roland
Roland

Reputation: 23262

distinctBy does not alter the content of the list instead it uses the given transformation function to compare each entry in the list and then returns a new list.

So even though a and A are distinct by your first definition it just returns the first found match (in this case a). Your list after the first distinct contains the following items:

[a, A  , B]

The next distinct takes those elements and compares them by trimming the contents which basically leads to the same result as all are distinct based on your second distinct transformation function.

What you probably want to do is something like:

listOf("a", "A", "A  ", "B")
   .distinctBy { it.toLowerCase().trim() }

which actually combines both transformation functions and leads to:

[a, B]

Alternatively you could do something like:

listOf("a", "A", "A  ", "B")
    .map(String::toLowerCase)
    .map(String::trim)
    .distinct()
    .also(::println)

which then Returns:

[a, b]

Or if you really care which input you got:

listOf("a", "A", "A  ", "B")
    .groupBy { it.toLowerCase().trim() }
    .let(::println)

which returns:

{a=[a, A, A  ], b=[B]}

From here you can take one or all of the ~real inputs, e.g. you could add a .map { it.value[0] } to again just return the first ~real match of any group.

Upvotes: 4

Related Questions