stasbar
stasbar

Reputation: 105

Merge two collections (with duplication) to one collection (without duplication)

I have problem with syncing data on multiple clients. For simplicity, let's say I have two collections where may be duplications (name is the key):

"collection1": [
    {
    "name": "a",
    "timestamp": 1
  },
    {
    "name": "a",
    "timestamp": 2
  },
    {
    "name": "b",
    "timestamp": 1
  }]

   "collection2": [
    {
    "name": "a",
    "timestamp": 3
  },
    {
    "name": "c",
    "timestamp": 2
  }]

And I want collection without duplications (name is the key), with the highest timestamp values. So after merge it should looks like:

"collection3": [
     {
    "name": "b",
    "timestamp": 1
  },
    {
    "name": "a",
    "timestamp": 3
  },
    {
    "name": "c",
    "timestamp": 2
  }]

Btw. I don't care about order. I can't just simply Set = new HashSet<>(collectionn); because lack of replace overload.

My idea is to do 3 times removeDuplication

  1. collection1 with collection1
  2. collection2 with collection2
  3. collection1 with collection2

with this O(n^2) removeDuplication function:

LinkedList<MyObject> finalList = new LinkedList<>();

    for (MyObject newObject : collection) {
        boolean foundSimillar = false;
        for (MyObject objectAlreadyAdded : finalList) {
            if (Objects.equals(newObject, objectAlreadyAdded)) { // in this case if(name1 == name2)
                foundSimillar = true;
                long newObjectTime = newObject.lastTimeModified;
                long alreadyAddedObjectTime = objectAlreadyAdded.lastTimeModified;
                if (newObjectTime > alreadyAddedObjectTime) {
                    finalList.remove(objectAlreadyAdded);
                    finalList.add(newObject);
                    break;
                }
            }
        }
        if(!foundSimillar)
            finalList.add(newObject);
    }

Is there any more efficient algorithm than my 3 * O(n^2) ?

Upvotes: 0

Views: 116

Answers (3)

sprinter
sprinter

Reputation: 27966

If you are using Java 8 and have stored the values in maps then you can do the entire thing in one step if you are ok with updating a current collection rather than creating a new one:

Map<String, Date> map1, map2;

map1.forEach((n, d) -> map2.merge(n, d, (d1, d2) -> d1.after(d2) ? d1 : d2));

Map.merge is a very clever method. It adds the key and value if it doesn't exist. If it does then it applies the given function to determine the value to use. Perfect for your situation really.

If you want it in a new map instead of the old one then:

Map<String, Date> map3 = new HashMap<>(map1);
map2.forEach((n, d) -> map3.merge(n, d, (v1, v2) -> v1.after(v2) ? v1 : v2));

Upvotes: 1

Bohemian
Bohemian

Reputation: 425033

You can do it with standard stream collectors in O(N) time:

Collection<Thing> merged =
        Stream.of(collection1, collection2)
        .flatMap(Collection::stream)
        .collect(groupingBy(Thing::getName, Collectors.maxBy(Comparator.comparing(Thing::getTimestamp))))
        .values()
        .stream()
        .map(Optional::get)
        .collect(Collectors.toList());

Upvotes: 2

ucsunil
ucsunil

Reputation: 7494

You should be able to do this with a Map. You simply check if the key is in the map (then update value if newValue > oldValue), else you move to the next element.

The total time complexity should be O(N)

You can reconstruct your json with a single iteration once all the inserts are completed.

Upvotes: 4

Related Questions