user1923631
user1923631

Reputation: 403

Ordering a set without duplication in JAVA

I need to order a set without duplicating it in memory, using a custom comparator.

The naive implementation would be:

Set<MyClass> newSet = new TreeSet<>(myComparator);
newSet.addAll(oldSet);

but this would imply that, even for a limited time, I'll have two sets in memory: oldSet (unordered) and newSet(ordered). Since they will be very large, I would like to avoid this.

I would like to perform something like this:

oldSet = new TreeSet<>(oldSet, myComparator);

which actually is not possible, since there is no constructor for TreeSet with such structure.

Could this be a solution?

Iterator<MyClass> it = oldSet.iterator();
Set<MyClass> newSet = new TreeSet<>(myComparator);
while(it.hasNext())
{
    newSet.add(it.next());
    it.remove();
}       

Something better to suggest?

Thank you

Upvotes: 2

Views: 132

Answers (5)

dbrown0708
dbrown0708

Reputation: 4754

You should write an Iterator implementation where each call to next() give you the next sorted item. It won't take no additional memory, but the amount of additional memory will be small compared to duplicating the unordered Set. You also won't have a new Set, but you will be able to iterate through it.

A low-memory version, but inefficient algorithm would store the most recently accessed item in the Iterator. Every time you needed to return the next item, you'd go through all the items in the backing Set to figure out which was next.

Upvotes: 0

Pawel
Pawel

Reputation: 1467

When you create a set with set in constructor, you create swallow copy. You copy only references. When you delete you delete references too. It is visible in the code below:

MyComparator myComparator = new MyComparator();
Set<Object> newSet = new TreeSet<>(myComparator);
Object mc = new Object();
newSet.add(mc); //set is created

Set<Object> newerSet = new TreeSet<>(myComparator);
newerSet.addAll(newSet);
System.out.println(newSet);
System.out.println(newerSet);

Output: [java.lang.Object@1bb1deea] [java.lang.Object@1bb1deea]

Reference to the same object.

newerSet.remove(mc);
System.out.println("After deletion");
System.out.println(newSet);
System.out.println(newerSet);

After deletion [java.lang.Object@1bb1deea] []

Only reference is removed.

Upvotes: 0

Marko Topolnik
Marko Topolnik

Reputation: 200158

Using a TreeSet will not be the most memory-efficient for this, and it won't even be the fastest way.

You should use an ArrayList and perform a sort on it:

List<MyClass> sorted = new ArrayList<>(oldSet.size());
oldSet = null;
Collections.sort(sorted, myComparator);

The overhead of a single array used inside ArrayList should not be an issue, and in any case it is the smallest issue you can have.

The single-shot bulk sort operation is faster than finding the right place for each individual item in a TreeSet, along with all the allocation needed in that case.

Upvotes: 2

Evgeniy Dorofeev
Evgeniy Dorofeev

Reputation: 136002

If you can null all the references to the old set do it

newSet.addAll(oldSet);
oldSet = null;

if you cannot null all the references to the old set use Set.clear method

newSet.addAll(oldSet);
oldSet.clear();

note that after clear HashSet's inner hashtable does not shrink

Upvotes: 0

LionC
LionC

Reputation: 3106

As a Set is not ordered by definition, there is no way to order a Set, so (as you do it) you have to use an ordered data structure. However you do not need to care about the problem you see at all, Java will not perform a deep copy of the Set if you perform addAll, it will just copy references which uses nearly no RAM.

So your addAll Solution is a clean and correct one.

Upvotes: 0

Related Questions