Jaho
Jaho

Reputation: 51

Removing list elements than are made of the same numbers

I have a python a program that outputs a list like this:

['0007', '0016', '0025', '0034', '0043', '0052', '0061', '0070', '0106', '0115', '0124', '0133', '0142', '0151', '0160', '0205', '0214', '0223', '0232', '0241', '0250', '0304', '0313', '0322', '0331', '0340', '0403', '0412', '0421', '0430', '0502', '0511', '0520', '0601', '0610', '0700', '1006', '1015', '1024', '1033', '1042', '1051', '1060', '1105', '1114', '1123', '1132', '1141', '1150', '1204', '1213', '1222', '1231', '1240', '1303', '1312', '1321', '1330', '1402', '1411', '1420', '1501', '1510', '1600', '2005', '2014', '2023', '2032', '2041', '2050', '2104', '2113', '2122', '2131', '2140', '2203', '2212', '2221', '2230', '2302', '2311', '2320', '2401', '2410', '2500', '3004', '3013', '3022', '3031', '3040', '3103', '3112', '3121', '3130', '3202', '3211', '3220', '3301', '3310', '3400', '4003', '4012', '4021', '4030', '4102', '4111', '4120', '4201', '4210', '4300', '5002', '5011', '5020', '5101', '5110', '5200', '6001', '6010', '6100', '7000']

Theoretically it doesn't contain any duplicates, but it contains elements like '0007' and '7000' which are made of the same elements (3 zeros and 1 seven), standard filtering script won't catch them. How to make one than will remove them? After consultation it turns out than the order doesn't need to be kept, so your solution works well, thanks guys

(If my post is a duplicate, then I am sorry but I couldn't find any the same questions. Please link one with a solution)

Upvotes: 2

Views: 77

Answers (6)

kederrac
kederrac

Reputation: 17322

you could iterate over your list and add each element to a set, before adding your element to the set you should check if the element/reverse element already exist. This will eliminate your issue

my_list = ['0007', '7000']
final = set()
for item in my_list:
    if item not in final and item[::-1] not in final:
        final.add(item)
final = list(final)
print(final) 
# output: ['0007']

Upvotes: 0

C.Nivs
C.Nivs

Reputation: 13106

You can use sorted to sort the strings lexicographically to create unique entries that then you can store in a dictionary for fast lookup

some_filter = {} # will create a lookup table for unique combinations of chars
filtered_results = [] # contain the final results

for x in result:
    hashable = "".join(sorted(x))
    if not some_filter.get(hashable):
        filtered_results.append(x)
        some_filter[hashable] = True

print(filtered_results)

Upvotes: 0

Óscar López
Óscar López

Reputation: 236004

If you don't mind ending with a list of elements in a different order, here's an idea:

lst = [ ... your input ... ]
uniques = list({''.join(sorted(n)) for n in lst})

Explanation:

  • Each string in the input is treated as a sorted list of characters, to treat same combinations in different order as the same case
  • After that, we join each list back into a string
  • We remove the duplicates by using a set comprehension
  • Finally, we convert everything back into a list

The result looks like this:

['0016', '0124', '1222', '0115', '0034', '0025', '0223', '0007', '1123', '1114', '0133']

If you definitely want to keep only the first occurrence of an element, we can do it like this, but with a performance penalty:

result = []
for n in lst:
    unique = ''.join(sorted(n))
    if unique not in result:
        result.append(n)

result
=> ['0007', '0016', '0025', '0034', '0115', '0124', '0133', '0223', '1114', '1123', '1222']

Upvotes: 2

mad_
mad_

Reputation: 8273

Using set to check if already visited while maintaining the order. It will filter out the elements already seen before considering '0007' and '7000' same and in set we can keep counts of 0 and 7 rather than element itself

l = ['0007', '0016', '0025', '0034', '0043', '0052', '0061', '0070', '0106', '0115',  '0124', '0133', '0142', '0151', '0160', '0205', '0214', '0223', '0232', '0241', '0250', '0304', '0313', '0322', '0331', '0340', '0403', '0412', '0421', '0430', '0502', '0511', '0520', '0601', '0610', '0700', '1006', '1015', '1024', '1033', '1042', '1051', '1060', '1105', '1114', '1123', '1132', '1141', '1150', '1204', '1213', '1222', '1231', '1240', '1303', '1312', '1321', '1330', '1402', '1411', '1420', '1501', '1510', '1600', '2005', '2014', '2023', '2032', '2041', '2050', '2104', '2113', '2122', '2131', '2140', '2203', '2212', '2221', '2230', '2302', '2311', '2320', '2401', '2410', '2500', '3004', '3013', '3022', '3031', '3040', '3103', '3112', '3121', '3130', '3202', '3211', '3220', '3301', '3310', '3400', '4003', '4012', '4021', '4030', '4102', '4111', '4120', '4201', '4210', '4300', '5002', '5011', '5020', '5101', '5110', '5200', '6001', '6010', '6100', '7000']

from collections import Counter
s=set()
new_list=[]
for i in l:
    if tuple(Counter(sorted(i,key=int)).items()) in s:
        pass
    else:
        s.add(tuple(Counter(sorted(i,key=int)).items()))
        new_list.append(i)

Output:

['0007',
 '0016',
 '0025',
 '0034',
 '0115',
 '0124',
 '0133',
 '0223',
 '1114',
 '1123',
 '1222']

Upvotes: 0

Alfe
Alfe

Reputation: 59426

You should convert your elements to something which will equal for inputs like "0007" and "7000". The first thing which comes to mind is a counter. Then put your elements in a set(), that will remove all your doubles:

from collections import Counter

input_elements = ['0007', '0016', '0025', '0034', '0043', '0052', '0061',
                  '0070', '0106', '0115', '0124', '0133', '0142', '0151',
                  '0160', '0205', '0214', '0223', '0232', '0241', '0250',
                  # ...
                  '7000']
s = set(Counter(e) for e in input_elements)

Now s will contain a set of all the input_elements with the doubles removed.

Unfortunately, Counters are unhashable (what a pity). So you could go with a tuple version of the Counters:

s = set(tuple(Counter(e).items()) for e in input_elements)

The most beautiful way I can think of is to create your own string class which has this specific property that things are considered equal when they have the same digits, regardless of their order:

class OrderIrrelevantString(str):
  def __hash__(self):
    return hash(''.join(sorted(self)))
  def __eq__(self, other):
    return sorted(self) == sorted(other)

Using this you can do it just like this:

s = set(OrderIrrelevantString(e) for e in input_elements)

The result then will be a set of OrderIrrelevantStrings which will look and behave just like normal strings, so you probably can use them for whatever you want to do with them right away.

Upvotes: 1

r.ook
r.ook

Reputation: 13868

Use set() to eliminate the duplicates, and then use sorted() to sort it with the original list order.

l = ['0007', '0016', '0025', '0034', '0043', '0052', '0061', '0070', '0106', '0115',  '0124', '0133', '0142', '0151', '0160', '0205', '0214', '0223', '0232', '0241', '0250', '0304', '0313', '0322', '0331', '0340', '0403', '0412', '0421', '0430', '0502', '0511', '0520', '0601', '0610', '0700', '1006', '1015', '1024', '1033', '1042', '1051', '1060', '1105', '1114', '1123', '1132', '1141', '1150', '1204', '1213', '1222', '1231', '1240', '1303', '1312', '1321', '1330', '1402', '1411', '1420', '1501', '1510', '1600', '2005', '2014', '2023', '2032', '2041', '2050', '2104', '2113', '2122', '2131', '2140', '2203', '2212', '2221', '2230', '2302', '2311', '2320', '2401', '2410', '2500', '3004', '3013', '3022', '3031', '3040', '3103', '3112', '3121', '3130', '3202', '3211', '3220', '3301', '3310', '3400', '4003', '4012', '4021', '4030', '4102', '4111', '4120', '4201', '4210', '4300', '5002', '5011', '5020', '5101', '5110', '5200', '6001', '6010', '6100', '7000']

sorted(list(set(''.join(sorted(x)) for x in l)), key=lambda x: l.index(x))

# ['0007', '0016', '0025', '0034', '0115', '0124', '0133', '0223', '1114', '1123', '1222']

Upvotes: 3

Related Questions