Reputation: 51
I have a python a program that outputs a list like this:
['0007', '0016', '0025', '0034', '0043', '0052', '0061', '0070', '0106', '0115', '0124', '0133', '0142', '0151', '0160', '0205', '0214', '0223', '0232', '0241', '0250', '0304', '0313', '0322', '0331', '0340', '0403', '0412', '0421', '0430', '0502', '0511', '0520', '0601', '0610', '0700', '1006', '1015', '1024', '1033', '1042', '1051', '1060', '1105', '1114', '1123', '1132', '1141', '1150', '1204', '1213', '1222', '1231', '1240', '1303', '1312', '1321', '1330', '1402', '1411', '1420', '1501', '1510', '1600', '2005', '2014', '2023', '2032', '2041', '2050', '2104', '2113', '2122', '2131', '2140', '2203', '2212', '2221', '2230', '2302', '2311', '2320', '2401', '2410', '2500', '3004', '3013', '3022', '3031', '3040', '3103', '3112', '3121', '3130', '3202', '3211', '3220', '3301', '3310', '3400', '4003', '4012', '4021', '4030', '4102', '4111', '4120', '4201', '4210', '4300', '5002', '5011', '5020', '5101', '5110', '5200', '6001', '6010', '6100', '7000']
Theoretically it doesn't contain any duplicates, but it contains elements like '0007' and '7000' which are made of the same elements (3 zeros and 1 seven), standard filtering script won't catch them. How to make one than will remove them? After consultation it turns out than the order doesn't need to be kept, so your solution works well, thanks guys
(If my post is a duplicate, then I am sorry but I couldn't find any the same questions. Please link one with a solution)
Upvotes: 2
Views: 77
Reputation: 17322
you could iterate over your list and add each element to a set, before adding your element to the set you should check if the element/reverse element already exist. This will eliminate your issue
my_list = ['0007', '7000']
final = set()
for item in my_list:
if item not in final and item[::-1] not in final:
final.add(item)
final = list(final)
print(final)
# output: ['0007']
Upvotes: 0
Reputation: 13106
You can use sorted
to sort the strings lexicographically to create unique entries that then you can store in a dictionary for fast lookup
some_filter = {} # will create a lookup table for unique combinations of chars
filtered_results = [] # contain the final results
for x in result:
hashable = "".join(sorted(x))
if not some_filter.get(hashable):
filtered_results.append(x)
some_filter[hashable] = True
print(filtered_results)
Upvotes: 0
Reputation: 236004
If you don't mind ending with a list of elements in a different order, here's an idea:
lst = [ ... your input ... ]
uniques = list({''.join(sorted(n)) for n in lst})
Explanation:
The result looks like this:
['0016', '0124', '1222', '0115', '0034', '0025', '0223', '0007', '1123', '1114', '0133']
If you definitely want to keep only the first occurrence of an element, we can do it like this, but with a performance penalty:
result = []
for n in lst:
unique = ''.join(sorted(n))
if unique not in result:
result.append(n)
result
=> ['0007', '0016', '0025', '0034', '0115', '0124', '0133', '0223', '1114', '1123', '1222']
Upvotes: 2
Reputation: 8273
Using set
to check if already visited while maintaining the order. It will filter out the elements already seen before considering '0007'
and '7000'
same and in set we can keep counts of 0
and 7
rather than element itself
l = ['0007', '0016', '0025', '0034', '0043', '0052', '0061', '0070', '0106', '0115', '0124', '0133', '0142', '0151', '0160', '0205', '0214', '0223', '0232', '0241', '0250', '0304', '0313', '0322', '0331', '0340', '0403', '0412', '0421', '0430', '0502', '0511', '0520', '0601', '0610', '0700', '1006', '1015', '1024', '1033', '1042', '1051', '1060', '1105', '1114', '1123', '1132', '1141', '1150', '1204', '1213', '1222', '1231', '1240', '1303', '1312', '1321', '1330', '1402', '1411', '1420', '1501', '1510', '1600', '2005', '2014', '2023', '2032', '2041', '2050', '2104', '2113', '2122', '2131', '2140', '2203', '2212', '2221', '2230', '2302', '2311', '2320', '2401', '2410', '2500', '3004', '3013', '3022', '3031', '3040', '3103', '3112', '3121', '3130', '3202', '3211', '3220', '3301', '3310', '3400', '4003', '4012', '4021', '4030', '4102', '4111', '4120', '4201', '4210', '4300', '5002', '5011', '5020', '5101', '5110', '5200', '6001', '6010', '6100', '7000']
from collections import Counter
s=set()
new_list=[]
for i in l:
if tuple(Counter(sorted(i,key=int)).items()) in s:
pass
else:
s.add(tuple(Counter(sorted(i,key=int)).items()))
new_list.append(i)
Output:
['0007',
'0016',
'0025',
'0034',
'0115',
'0124',
'0133',
'0223',
'1114',
'1123',
'1222']
Upvotes: 0
Reputation: 59426
You should convert your elements to something which will equal for inputs like "0007"
and "7000"
. The first thing which comes to mind is a counter. Then put your elements in a set()
, that will remove all your doubles:
from collections import Counter
input_elements = ['0007', '0016', '0025', '0034', '0043', '0052', '0061',
'0070', '0106', '0115', '0124', '0133', '0142', '0151',
'0160', '0205', '0214', '0223', '0232', '0241', '0250',
# ...
'7000']
s = set(Counter(e) for e in input_elements)
Now s
will contain a set of all the input_elements with the doubles removed.
Unfortunately, Counter
s are unhashable (what a pity). So you could go with a tuple version of the Counters:
s = set(tuple(Counter(e).items()) for e in input_elements)
The most beautiful way I can think of is to create your own string class which has this specific property that things are considered equal when they have the same digits, regardless of their order:
class OrderIrrelevantString(str):
def __hash__(self):
return hash(''.join(sorted(self)))
def __eq__(self, other):
return sorted(self) == sorted(other)
Using this you can do it just like this:
s = set(OrderIrrelevantString(e) for e in input_elements)
The result then will be a set of OrderIrrelevantString
s which will look and behave just like normal strings, so you probably can use them for whatever you want to do with them right away.
Upvotes: 1
Reputation: 13868
Use set()
to eliminate the duplicates, and then use sorted()
to sort it with the original list order.
l = ['0007', '0016', '0025', '0034', '0043', '0052', '0061', '0070', '0106', '0115', '0124', '0133', '0142', '0151', '0160', '0205', '0214', '0223', '0232', '0241', '0250', '0304', '0313', '0322', '0331', '0340', '0403', '0412', '0421', '0430', '0502', '0511', '0520', '0601', '0610', '0700', '1006', '1015', '1024', '1033', '1042', '1051', '1060', '1105', '1114', '1123', '1132', '1141', '1150', '1204', '1213', '1222', '1231', '1240', '1303', '1312', '1321', '1330', '1402', '1411', '1420', '1501', '1510', '1600', '2005', '2014', '2023', '2032', '2041', '2050', '2104', '2113', '2122', '2131', '2140', '2203', '2212', '2221', '2230', '2302', '2311', '2320', '2401', '2410', '2500', '3004', '3013', '3022', '3031', '3040', '3103', '3112', '3121', '3130', '3202', '3211', '3220', '3301', '3310', '3400', '4003', '4012', '4021', '4030', '4102', '4111', '4120', '4201', '4210', '4300', '5002', '5011', '5020', '5101', '5110', '5200', '6001', '6010', '6100', '7000']
sorted(list(set(''.join(sorted(x)) for x in l)), key=lambda x: l.index(x))
# ['0007', '0016', '0025', '0034', '0115', '0124', '0133', '0223', '1114', '1123', '1222']
Upvotes: 3