cardician
cardician

Reputation: 2491

Fast way to iterate through a set and remove items meeting certain criteria

So I've got a program that contains a bunch of records in a Set. The set could have a few items or potentially hundreds of thousands. One bit of data each record has is a timestamp. I need to eliminate all items in a set but one that are within 15 seconds of each other. What is the most efficient way to do that?

Currently I create a duplicate of the set. Then I iterate through the set comparing the first item to every other item, and repeat. If a match is found that is within 15 seconds, I remove it from the duplicate set. Then the duplicate set is written out to a file.

Obviously this works but I finally realized that this is ridiculously inefficient. For large sets this seems to take a crazy long time, assuming it's not some other problem occurring. Can someone provide me with a smarter, faster, more efficient (or just a proper) way to do this in Java? I'm realizing since the records contain timestamps that sorting them would probably help tremendously. I'd like to keep this all contained within the program though so I guess I need to look into sorting and Comparators.

I just can't quite wrap my head around the problem. I've come up with some other thoughts to improve my code but I can't help but feeling I'm still coming at this entirely wrong. Thanks for any suggestions.

Oh, and this is for work, not school or anything so any help is appreciated.

Upvotes: 0

Views: 1610

Answers (4)

mightyrick
mightyrick

Reputation: 910

I haven't tested for performance, but one way I might implement this is to create a Set and override the equals() method for the object types in question.

public boolean equals( Object o )
{
  return( Math.abs( this.getTimestampSeconds() - o.getTimestampSeconds() ) < 15 );
}

By doing this, when you add each row to the set, you will only end up with one entry for any given 15 second time slice.

* EDIT **

I would not perform this override for a regular domain object. I probably would only do this in a facade object of some sort -- which is created solely for this purpose.

Also, as others have said. This presumes that your input list is sorted by ascending timestamp.

Upvotes: 0

lbalazscs
lbalazscs

Reputation: 17809

You can continue using a Set, just make sure that it is sorted from the very beginning, like TreeSet (or ConcurrentSkipListSet if you have multiple threads). Either you implement Comparable so that the timestamps are compared, or you supply a Comparator that does the same.

This will guarantee that you cannot have duplicates (like you had it until now), and also simplify your code. Inserting into a TreeSet will also cost you O(n log n) time.

From here on you can continue with the approach suggested by Sam I am: the iterator will traverse it in ascending element order, you need to compare each element only with the previous one and the next one.

BTW, you don't need to copy everything to another Set, just make sure to use the remove method of the iterator, not the remove of the TreeSet: Iterating through a Collection, avoiding ConcurrentModificationException when removing in loop

Upvotes: 1

fge
fge

Reputation: 121860

If you have a map, say:

Map<Long, List<MyClass>> map;

where the key is the timestamp, then you can do this:

// Value of wanted elements
List<MyClass> ret = new ArrayList<MyClass>();

// Go over all timestamps: if a timestamp is wanted, add all
// corresponding elements
for (Map.Entry<Long, List<MyClass>> entry: map.entrySet())
    if (wanted(entry.getKey()))
        ret.addAll(entry.getValue());

// Return
return ret;

Upvotes: 0

Right now, the algorithm you've described runs in O(n2) time.

Now, If you needed a faster algorithm, what you can do is

  • Sort your collection (I'd be surprised if java didn't have a base-class sorting function) O(n * lg(n))
  • All "matches" within 15 seconds of each other are going to be right next to each-other
  • You need only iterate through each element once checking only adjacent elements O(n)

If you do that, than your algorithm could be a much more manageble O(n * lg(n)) time complexity

Here's some information regarding Java's Array.sort()

Upvotes: 5

Related Questions