samba
samba

Reputation: 3111

Java - how to remove duplicates from a collection of timestamps?

I have a List of timestamps in milliseconds and I want to compare them and remove duplicates not considering the milliseconds part. And process each unique value.

For example, millis2 and millis3 are different values if compared without truncating the milliseconds part (2:28:14.100 vs 2:28:14.200). But I need to disregard the millis and if the two values are compared truncated to seconds, they will be considered duplicates.

So I decided to create a List of timestamps, sort it in reverse order. Then iterate over the collection checking if truncated values are not equal. And add unique values to a List<Long> deduped.

    Long millis0 = 1554052261000L; // Sunday, March 31, 2019 5:11:01 PM
    Long millis1 = 1557023292000L; // Sunday, May 5, 2019 2:28:12 AM
    Long millis2 = 1557023294100L; // Sunday, May 5, 2019 2:28:14.100 AM
    Long millis3 = 1557023294200L; // Sunday, May 5, 2019 2:28:14.200 AM

    List<Long> initialTimestamps = Arrays.asList(millis2, millis3, millis0, millis1);

    Comparator<Long> comparator = Collections.reverseOrder();
    Collections.sort(initialTimestamps, comparator);

    Long prevTs = null;
    List<Long> deduped = new ArrayList<>();

    for (Long ts: initialTimestamps) {
        if (prevTs != null && !millisToSeconds(prevTs).equals(millisToSeconds(ts))) {
        deduped.add(prevTs);
        process(prevTs)
    }
    prevTs = ts;
    deduped.add(prevTs);
    process(prevTs)
}

However when printing out the contents of deduped, there are duplicates:

Deduped timestamps ->
1557023294200
1557023294100
1557023294100
1557023292000
1557023292000
1554052261000

But I expect that after deduplication there will remain only 1557023294, 1557023292 and 1554052261. What am I missing here?

Upvotes: 0

Views: 472

Answers (1)

Svetlin Zarev
Svetlin Zarev

Reputation: 15683

If you can use java 8, then you can use stream().distinct():

public static void main(String[] args) throws Exception {
    Long millis0 = 1554052261000L; // Sunday, March 31, 2019 5:11:01 PM
    Long millis1 = 1557023292000L; // Sunday, May 5, 2019 2:28:12 AM
    Long millis2 = 1557023294100L; // Sunday, May 5, 2019 2:28:14.100 AM
    Long millis3 = 1557023294200L; // Sunday, May 5, 2019 2:28:14.200 AM

    List<Long> initialTimestamps = Arrays.asList(millis2, millis3, millis0, millis1);
    List<Long> unique = initialTimestamps.stream().distinct().collect(Collectors.toList());

    System.out.println(unique);
}

For java < 8, you can put them in a Set:

public static void main(String[] args) throws Exception {
    Long millis0 = 100L; // Sunday, March 31, 2019 5:11:01 PM
    Long millis1 = 100L; // Sunday, May 5, 2019 2:28:12 AM
    Long millis2 = 200L; // Sunday, May 5, 2019 2:28:14.100 AM
    Long millis3 = 200L; // Sunday, May 5, 2019 2:28:14.200 AM

    List<Long> initialTimestamps = Arrays.asList(millis2, millis3, millis0, millis1);
    Set<Long> unique = new HashSet<Long>(initialTimestamps);

    System.out.println(unique);
}

Update

As per your requirement to ignore the milliseconds, you can use a Map (if you want to preserve the millis) or use one of the approaches above, if you do not care about the milliseconds. In that case just divide the values by 1_000

public static void main(String[] args) throws Exception {
    Long millis0 = 1554052261000L; // Sunday, March 31, 2019 5:11:01 PM
    Long millis1 = 1557023292000L; // Sunday, May 5, 2019 2:28:12 AM
    Long millis2 = 1557023294100L; // Sunday, May 5, 2019 2:28:14.100 AM
    Long millis3 = 1557023294200L; // Sunday, May 5, 2019 2:28:14.200 AM

    List<Long> initialTimestamps = Arrays.asList(millis2, millis3, millis0, millis1);
    Map<Long, Long> unique = new HashMap<>();

    for (Long timestamp : initialTimestamps) {
        unique.put(timestamp / 1000, timestamp);
    }

    System.out.println(unique.values());
}

If you want to preserve the first value of each duplicate, then use

if (!unique.containsKey(timestamp / 1000)) {
    unique.put(timestamp / 1000, timestamp);
}

instead of just put(). If you want to preserve the initial order of all timestamps, the you should use LinkedHashMap instead of HashMap

Upvotes: 4

Related Questions