Mike Mcmahon
Mike Mcmahon

Reputation: 65

Python unique values in a list

I am new to Python and I am finding set() to be a bit confusing. Can someone offer some help with finding and creating a new list of unique numbers( another words eliminate duplicates)?

import string
import re

def go():
        import re
        file = open("C:/Cryptography/Pollard/Pollard/newfile.txt","w")
        filename = "C:/Cryptography/Pollard/Pollard/primeFactors.txt"
        with open(filename, 'r') as f:
                lines = f.read()

                found = re.findall(r'[\d]+[^\d.\d+()+\s]+[^\s]+[\d+\w+\d]+[\d+\^+\d]+[\d+\w+\d]+', lines)
                a = found
                for i in range(5):
                         a[i] = str(found[i])
                         print(a[i].split('x'))

Now

print(a[i].split('x')) 

....gives the following output

['2', '3', '1451', '40591', '258983', '11409589', '8337580729',
'1932261797039146667']

['2897', '514081', '585530047', '108785617538783538760452408483163']

['2', '3', '5', '19', '28087', '4947999059',
'2182718359336613102811898933144207']

['3', '5', '53', '293', '31159', '201911', '7511070764480753',
'22798192180727861167']

['2', '164493637239099960712719840940483950285726027116731']

How do I output a list of only non repeating numbers? I read on the forums that "set()" can do this, but I have tried this with no avail. Any help is much appreciated!

Upvotes: 2

Views: 22725

Answers (3)

elucify
elucify

Reputation: 93

If you want unique values from the flattened list, you can use reduce() to flatten the list. Then use the frozenset() constructor to get the result list:

>>> data = [
   ['2', '3', '1451', '40591', '258983', '11409589', '8337580729', '1932261797039146667'],
   ['2897', '514081', '585530047', '108785617538783538760452408483163'],
   ['2', '3', '5', '19', '28087', '4947999059', '2182718359336613102811898933144207'],
   ['3', '5', '53', '293', '31159', '201911', '7511070764480753', '22798192180727861167'],
   ['2', '164493637239099960712719840940483950285726027116731']]

>>> print list(frozenset(reduce((lambda a, b: a+b), data)))
['514081', '258983', '40591', '201911', '11409589', '585530047', '3',
'2', '5', '108785617538783538760452408483163', '22798192180727861167',
'164493637239099960712719840940483950285726027116731', '8337580729', 
'4947999059', '19', '2897', '7511070764480753', '53', '28087', 
'2182718359336613102811898933144207', '1451', '31159',
'1932261797039146667', '293']

Upvotes: 0

Blckknght
Blckknght

Reputation: 104852

A set is a collection (like a list or tuple), but it does not allow duplicates and has very fast membership testing. You can use a list comprehension to filter out values in one list that have appeared in a previous list:

data = [['2', '3', '1451', '40591', '258983', '11409589', '8337580729', '1932261797039146667'],
        ['2897', '514081', '585530047', '108785617538783538760452408483163'],
        ['2', '3', '5', '19', '28087', '4947999059', '2182718359336613102811898933144207'],
        ['3', '5', '53', '293', '31159', '201911', '7511070764480753', '22798192180727861167'],
        ['2', '164493637239099960712719840940483950285726027116731']]

seen = set() # set of seen values, which starts out empty

for lst in data:
    deduped = [x for x in lst if x not in seen] # filter out previously seen values
    seen.update(deduped)                        # add the new values to the set

    print(deduped)                              # do whatever with deduped list

Output:

['2', '3', '1451', '40591', '258983', '11409589', '8337580729', '1932261797039146667']
['2897', '514081', '585530047', '108785617538783538760452408483163']
['5', '19', '28087', '4947999059', '2182718359336613102811898933144207']
['53', '293', '31159', '201911', '7511070764480753', '22798192180727861167']
['164493637239099960712719840940483950285726027116731']

Note that this version does not filter out values that are duplicated within a single list (unless they're already duplicates of a value in a previous list). You could work around that by replacing the list comprehension with an explicit loop that checks each individual value against the seen set (and adds it if it's new) before appending to a list for output. Or if the order of the items in your sub-lists is not important, you could turn them into sets of their own:

seen = set()
for lst in data:
    lst_as_set = set(lst)               # this step eliminates internal duplicates
    deduped_set = lst_as_set - seen     # set subtraction!
    seen.update(deduped_set)

    # now do stuff with deduped_set, which is iterable, but in an arbitrary order

Finally, if the internal sub-lists are a red herring entirely and you want to simply filter a flattened list to get only unique values, that sounds like a job for the unique_everseen recipe from the itertools documentation:

def unique_everseen(iterable, key=None):
    "List unique elements, preserving order. Remember all elements ever seen."
    # unique_everseen('AAAABBBCCDAABBB') --> A B C D
    # unique_everseen('ABBCcAD', str.lower) --> A B C D
    seen = set()
    seen_add = seen.add
    if key is None:
        for element in ifilterfalse(seen.__contains__, iterable):
            seen_add(element)
            yield element
    else:
        for element in iterable:
            k = key(element)
            if k not in seen:
                seen_add(k)
                yield element

Upvotes: 4

Anthony Kong
Anthony Kong

Reputation: 40874

set should work in this case.

You can try the following:

# Concat all your lists into a single list
>>> a = ['2', '3', '1451', '40591', '258983', '11409589', '8337580729','1932261797039146667'] +['2897', '514081', '585530047', '108785617538783538760452408483163'] +['2', '3', '5', '19', '28087', '4947999059','2182718359336613102811898933144207'] + ['3', '5', '53', '293', '31159', '201911', '7511070764480753', '22798192180727861167']+ ['2', '164493637239099960712719840940483950285726027116731']
>>> len(a)
29
>>> set(a)
set(['514081', '258983', '40591', '201911', '11409589', '585530047', '3', '2', '5', '108785617538783538760452408483163', '2279819218\
0727861167', '164493637239099960712719840940483950285726027116731', '8337580729', '4947999059', '19', '2897', '7511070764480753', '5\
3', '28087', '2182718359336613102811898933144207', '1451', '31159', '1932261797039146667', '293'])

>>> len(set(a))
24
>>> 

Upvotes: 2

Related Questions