Reputation: 64793
I am curious what would be an efficient way of uniquifying such data objects:
testdata = [ ['9034968', 'ETH'], ['14160113', 'ETH'], ['9034968', 'ETH'], ['11111', 'NOT'], ['9555269', 'NOT'], ['15724032', 'ETH'], ['15481740', 'ETH'], ['15481757', 'ETH'], ['15481724', 'ETH'], ['10307528', 'ETH'], ['15481757', 'ETH'], ['15481724', 'ETH'], ['15481740', 'ETH'], ['15379365', 'ETH'], ['11111', 'NOT'], ['9555269', 'NOT'], ['15379365', 'ETH']
]
For each data pair, left numeric string PLUS the type at the right tells the uniqueness of a data element. The returned value should be a list of lists, the same as testdata
, but only unique values should be kept.
Upvotes: 100
Views: 111815
Reputation: 24366
Options for preserving order (Python 3.7+)
Inner lists become tuples:
list(dict.fromkeys(map(tuple, testdata)))
list({tuple(x): 1 for x in testdata})
Inner lists stay as lists (credits):
list({tuple(x): x for x in testdata}.values())
In case new list elements are a function of old ones,
either a walrus operator :=
can be used
list({tuple(y:=f(x)): y for x in testdata}.values())
or we can turn inner lists into tuples and then back to lists
list(map(list, {tuple(x): 1 for x in testdata}))
list(map(list, dict.fromkeys(map(tuple, testdata))))
Upvotes: 3
Reputation: 2509
I was about to post my own take on this until I noticed that @pyfunc had already come up with something similar. I'll post my take on this problem anyway in case it's helpful.
testdata =[ ['9034968', 'ETH'], ['14160113', 'ETH'], ['9034968', 'ETH'], ['11111', 'NOT'], ['9555269', 'NOT'], ['15724032', 'ETH'], ['15481740', 'ETH'], ['15481757', 'ETH'], ['15481724', 'ETH'], ['10307528', 'ETH'], ['15481757', 'ETH'], ['15481724', 'ETH'], ['15481740', 'ETH'], ['15379365', 'ETH'], ['11111', 'NOT'], ['9555269', 'NOT'], ['15379365', 'ETH']
]
flatdata = [p[0] + "%" + p[1] for p in testdata]
flatdata = list(set(flatdata))
testdata = [p.split("%") for p in flatdata]
print(testdata)
Basically, you concatenate each element of your list into a single string using a list comprehension, so that you have a list of single strings. This is then much easier to turn into a set, which makes it unique. Then you simply split it on the other end and convert it back to your original list.
I don't know how this compares in terms of performance but it's a simple and easy-to-understand solution I think.
Upvotes: 1
Reputation: 28322
Use unique
in numpy
to solve this:
import numpy as np
np.unique(np.array(testdata), axis=0)
Note that the axis
keyword needs to be specified otherwise the list is first flattened.
Alternatively, use vstack
:
np.vstack({tuple(row) for row in testdata})
Upvotes: 6
Reputation: 1518
if you have a list of objects than you can modify @Mark Byers answer to:
unique_data = [list(x) for x in set(tuple(x.testList) for x in testdata)]
where testdata is a list of objects which has a list testList as attribute.
Upvotes: 1
Reputation: 3308
Expanding a bit on @Mark Byers solution, you can also just do one list comprehension and typecast to get what you need:
testdata = list(set(tuple(x) for x in testdata))
Also, if you don't like list comprehensions as many find them confusing, you can do the same in a for loop:
for i, e in enumerate(testdata):
testdata[i] = tuple(e)
testdata = list(set(testdata))
Upvotes: 3
Reputation: 74695
I tried @Mark's answer and got an error. Converting the list and each elements into a tuple made it work. Not sure if this the best way though.
list(map(list, set(map(lambda i: tuple(i), testdata))))
Of course the same thing can be expressed using a list comprehension instead.
[list(i) for i in set(tuple(i) for i in testdata)]
I am using Python 2.6.2.
Update
@Mark has since changed his answer. His current answer uses tuples and will work. So will mine :)
Update 2
Thanks to @Mark. I have changed my answer to return a list of lists rather than a list of tuples.
Upvotes: 11
Reputation: 66709
import sets
testdata =[ ['9034968', 'ETH'], ['14160113', 'ETH'], ['9034968', 'ETH'], ['11111', 'NOT'], ['9555269', 'NOT'], ['15724032', 'ETH'], ['15481740', 'ETH'], ['15481757', 'ETH'], ['15481724', 'ETH'], ['10307528', 'ETH'], ['15481757', 'ETH'], ['15481724', 'ETH'], ['15481740', 'ETH'], ['15379365', 'ETH'], ['11111', 'NOT'], ['9555269', 'NOT'], ['15379365', 'ETH']]
conacatData = [x[0] + x[1] for x in testdata]
print conacatData
uniqueSet = sets.Set(conacatData)
uniqueList = [ [t[0:-3], t[-3:]] for t in uniqueSet]
print uniqueList
Upvotes: 1
Reputation: 838056
You can use a set:
unique_data = [list(x) for x in set(tuple(x) for x in testdata)]
You can also see this page which benchmarks a variety of methods that either preserve or don't preserve order.
Upvotes: 173