Reputation: 197

Remove duplicate data from an array in python

I have this array of data

data = [20001202.05, 20001202.05, 20001202.50, 20001215.75, 20021215.75]

I remove the duplicate data with list(set(data)), which gives me

data = [20001202.05, 20001202.50, 20001215.75, 20021215.75]

But I would like to remove the duplicate data, based on the numbers before the "period"; for instance, if there is 20001202.05 and 20001202.50, I want to keep one of them in my array.

Upvotes: 1

Answers (3)

Nick Humrich

Reputation: 15805

Generically, with python 3.7+, because dictionaries maintain order, you can do this, even when order matters:

data = {d:None for d in data}.keys()

However for OP's original problem, OP wants to de-dup based on the integer value, not the raw number, so see the top voted answer. But generically, this will work to remove true duplicates.

Upvotes: 4

San k

Reputation: 141

data1 = [20001202.05, 20001202.05, 20001202.50, 20001215.75, 20021215.75]
for i in data1:
   if i not in ls:
      ls.append(i)
print ls

Upvotes: 1

bufh

Reputation: 3420

As you don't care about the order of the items you keep, you could do:

>>> {int(d):d for d in data}.values()
[20001202.5, 20021215.75, 20001215.75]

If you would like to keep the lowest item, I can't think of a one-liner.

Here is a basic example for anybody who would like to add a condition on the key or value to keep.

seen = set()
result = []
for item in sorted(data):
    key = int(item)  # or whatever condition
    if key not in seen:
        result.append(item)
        seen.add(key)

Upvotes: 11

Remove duplicate data from an array in python

Answers (3)

Related Questions