Reputation: 65
I am new to Python and I am finding set() to be a bit confusing. Can someone offer some help with finding and creating a new list of unique numbers( another words eliminate duplicates)?
import string
import re
def go():
import re
file = open("C:/Cryptography/Pollard/Pollard/newfile.txt","w")
filename = "C:/Cryptography/Pollard/Pollard/primeFactors.txt"
with open(filename, 'r') as f:
lines = f.read()
found = re.findall(r'[\d]+[^\d.\d+()+\s]+[^\s]+[\d+\w+\d]+[\d+\^+\d]+[\d+\w+\d]+', lines)
a = found
for i in range(5):
a[i] = str(found[i])
print(a[i].split('x'))
Now
print(a[i].split('x'))
....gives the following output
['2', '3', '1451', '40591', '258983', '11409589', '8337580729',
'1932261797039146667']
['2897', '514081', '585530047', '108785617538783538760452408483163']
['2', '3', '5', '19', '28087', '4947999059',
'2182718359336613102811898933144207']
['3', '5', '53', '293', '31159', '201911', '7511070764480753',
'22798192180727861167']
['2', '164493637239099960712719840940483950285726027116731']
How do I output a list of only non repeating numbers? I read on the forums that "set()" can do this, but I have tried this with no avail. Any help is much appreciated!
Upvotes: 2
Views: 22725
Reputation: 93
If you want unique values from the flattened list, you can use reduce() to flatten the list. Then use the frozenset() constructor to get the result list:
>>> data = [
['2', '3', '1451', '40591', '258983', '11409589', '8337580729', '1932261797039146667'],
['2897', '514081', '585530047', '108785617538783538760452408483163'],
['2', '3', '5', '19', '28087', '4947999059', '2182718359336613102811898933144207'],
['3', '5', '53', '293', '31159', '201911', '7511070764480753', '22798192180727861167'],
['2', '164493637239099960712719840940483950285726027116731']]
>>> print list(frozenset(reduce((lambda a, b: a+b), data)))
['514081', '258983', '40591', '201911', '11409589', '585530047', '3',
'2', '5', '108785617538783538760452408483163', '22798192180727861167',
'164493637239099960712719840940483950285726027116731', '8337580729',
'4947999059', '19', '2897', '7511070764480753', '53', '28087',
'2182718359336613102811898933144207', '1451', '31159',
'1932261797039146667', '293']
Upvotes: 0
Reputation: 104852
A set
is a collection (like a list
or tuple
), but it does not allow duplicates and has very fast membership testing. You can use a list comprehension to filter out values in one list that have appeared in a previous list:
data = [['2', '3', '1451', '40591', '258983', '11409589', '8337580729', '1932261797039146667'],
['2897', '514081', '585530047', '108785617538783538760452408483163'],
['2', '3', '5', '19', '28087', '4947999059', '2182718359336613102811898933144207'],
['3', '5', '53', '293', '31159', '201911', '7511070764480753', '22798192180727861167'],
['2', '164493637239099960712719840940483950285726027116731']]
seen = set() # set of seen values, which starts out empty
for lst in data:
deduped = [x for x in lst if x not in seen] # filter out previously seen values
seen.update(deduped) # add the new values to the set
print(deduped) # do whatever with deduped list
Output:
['2', '3', '1451', '40591', '258983', '11409589', '8337580729', '1932261797039146667']
['2897', '514081', '585530047', '108785617538783538760452408483163']
['5', '19', '28087', '4947999059', '2182718359336613102811898933144207']
['53', '293', '31159', '201911', '7511070764480753', '22798192180727861167']
['164493637239099960712719840940483950285726027116731']
Note that this version does not filter out values that are duplicated within a single list (unless they're already duplicates of a value in a previous list). You could work around that by replacing the list comprehension with an explicit loop that checks each individual value against the seen
set (and add
s it if it's new) before appending to a list for output. Or if the order of the items in your sub-lists is not important, you could turn them into sets of their own:
seen = set()
for lst in data:
lst_as_set = set(lst) # this step eliminates internal duplicates
deduped_set = lst_as_set - seen # set subtraction!
seen.update(deduped_set)
# now do stuff with deduped_set, which is iterable, but in an arbitrary order
Finally, if the internal sub-lists are a red herring entirely and you want to simply filter a flattened list to get only unique values, that sounds like a job for the unique_everseen
recipe from the itertools
documentation:
def unique_everseen(iterable, key=None):
"List unique elements, preserving order. Remember all elements ever seen."
# unique_everseen('AAAABBBCCDAABBB') --> A B C D
# unique_everseen('ABBCcAD', str.lower) --> A B C D
seen = set()
seen_add = seen.add
if key is None:
for element in ifilterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k not in seen:
seen_add(k)
yield element
Upvotes: 4
Reputation: 40874
set
should work in this case.
You can try the following:
# Concat all your lists into a single list
>>> a = ['2', '3', '1451', '40591', '258983', '11409589', '8337580729','1932261797039146667'] +['2897', '514081', '585530047', '108785617538783538760452408483163'] +['2', '3', '5', '19', '28087', '4947999059','2182718359336613102811898933144207'] + ['3', '5', '53', '293', '31159', '201911', '7511070764480753', '22798192180727861167']+ ['2', '164493637239099960712719840940483950285726027116731']
>>> len(a)
29
>>> set(a)
set(['514081', '258983', '40591', '201911', '11409589', '585530047', '3', '2', '5', '108785617538783538760452408483163', '2279819218\
0727861167', '164493637239099960712719840940483950285726027116731', '8337580729', '4947999059', '19', '2897', '7511070764480753', '5\
3', '28087', '2182718359336613102811898933144207', '1451', '31159', '1932261797039146667', '293'])
>>> len(set(a))
24
>>>
Upvotes: 2