George
George

Reputation: 3030

Java 8 Stream Merge Partial Duplicates

I have a POJO that looks something like this:

public class Account {
    private Integer accountId;
    private List<String> contacts;
}

The equals And hashCode methods are set to use the accountId field to identify uniqueness, so any Accounts with the same accountId are equal regardless of what contacts contain.

I have a List of accounts and there are some duplicates with the same accountId. How do I use Java 8 Stream API to merge these duplicates together?

For example, the list of account contains:

+-----------+----------+
| accountId | contacts |
+-----------+----------+
|         1 | {"John"} |
|         1 | {"Fred"} |
|         2 | {"Mary"} |
+-----------+----------+

And I want it to produce a list of accounts like this:

+-----------+------------------+
| accountId |     contacts     |
+-----------+------------------+
|         1 | {"John", "Fred"} |
|         2 | {"Mary"}         |
+-----------+------------------+

Upvotes: 4

Views: 4706

Answers (3)

fps
fps

Reputation: 34460

You could add two constructors and a merge method to the Account class that would combine contacts:

public class Account {

    private final Integer accountId;

    private List<String> contacts = new ArrayList<>();

    public Account(Integer accountId) {
        this.accountId = accountId;
    }

    // Copy constructor
    public Account(Account another) {
        this.accountId = another.accountId;
        this.contacts = new ArrayList<>(another.contacts);
    }

    public Account merge(Account another) {
        this.contacts.addAll(another.contacts);
        return this;
    }

    // TODO getters and setters
}

Then, you have a few alternatives. One is to use Collectors.toMap to collect accounts to a map, grouping by accountId and merging the contacts of the accounts with equal accountId by means of the Account.merge method. Finally, get the values of the map:

Collection<Account> result = accounts.stream()
    .collect(Collectors.toMap(
        Account::getAccountId, // group by accountId (keys)
        Account::new,          // use copy constructor (values)
        Account::merge))       // merge values with equal key
    .values();

You need to use the copy constructor for the values, otherwise you would mutate the accounts of the original list when Account.merge is invoked.

An equivalent way (without streams) would be to use the Map.merge method:

Map<Integer, Account> map = new HashMap<>();
accounts.forEach(a -> 
    map.merge(a.getAccountId(), new Account(a), Account::merge));
Collection<Account> result = map.values();

Again, you need to use the copy constructor to avoid undesired mutations on the accounts of the original list.

A third alternative which is more optimized (because it doesn't create a new account for every element of the list) consists of using the Map.computeIfAbsent method:

Map<Integer, Account> map = new HashMap<>();
accounts.forEach(a -> map.computeIfAbsent(
        a.getAccountId(), // group by accountId (keys)
        Account::new)     // invoke new Account(accountId) if absent
    .merge(a));           // merge account's contacts
Collection<Account> result = map.values();

All the alternatives above return a Collection<Account>. If you need a List<Account> instead, you can do:

List<Account> list = new ArrayList<>(result);

Upvotes: 1

Holger
Holger

Reputation: 298203

A clean Stream API solution can be quiet complicated, so perhaps you’re better off with a Collection API solution that has less constraints to obey.

HashMap<Integer, Account> tmp = new HashMap<>();
listOfAccounts.removeIf(a -> a != tmp.merge(a.getAccountId(), a, (o,n) -> {
    o.getContacts().addAll(n.getContacts());
    return o;
}));

This directly removes all elements with a duplicate id from the list after having added their contacts to the first account of that id.

Of course, this assumes that the list supports removal and the list returned by getContacts() is a reference to the stored list and supports adding elements.

The solution is built around Map.merge which will add the specified object if the key didn’t exist or evaluates the merge function if the key already existed. The merge function returns the old object after having added the contacts, so we can do a reference comparison (a != …) to determine that we have a duplicate that should be removed.

Upvotes: 2

balki
balki

Reputation: 27664

Use Collectors.toMap Ref: https://docs.oracle.com/javase/8/docs/api/java/util/stream/Collectors.html#toMap-java.util.function.Function-java.util.function.Function-java.util.function.BinaryOperator-

@lombok.Value
class Account {
    Integer accountId;
    List<String> contacts;
}

List<Account> accounts = new ArrayList<>();
//Fill
List<Account> result = new ArrayList<>(accounts.stream()
    .collect(
        Collectors.toMap(Account::getAccountId, Function.identity(), (Account account1, Account account2) -> {
            account1.getContacts().addAll(account2.getContacts());
            account2.getContacts().clear();
            return account1;
        })
    )
    .values());

Upvotes: 4

Related Questions