Reputation: 115
I am trying to remove elements in a list that are present within other elements
for example:
l = ['word1', 'word1 word2', 'word3 word2', 'word2', 'word4']
the output of the function should be:
['word3 word2', 'word1 word2', 'word4']
because 'word1
' and 'word2
' are present in index 1 and 2 of the original list respectively
This is the algorithm I've come up with. It solves this in ~O(n^2)
l = ['word1', 'word1 word2', 'word3 word2', 'word2', 'word4']
s = set(l)
for root_word in l:
for item in (s-set([root_word])):
if root_word in item:
if root_word in s:
s.remove(root_word)
print(s)
Does anyone know of a more optimal solution? Maybe one using python built-ins to speed things up
Upvotes: 1
Views: 133
Reputation: 552
Your current solution is not O(n^2), but O(m.n^2) where m is the maximum number of words in an element of your list. You can use a frozenset
of words for each entry of your list and then use the issubset
method to test each of them. This will improve the performance for cases where your elements have a lot of words, but the complexity won’t change.
Upvotes: 1
Reputation: 131
l = ['word1', 'word1 word2', 'word3 word2', 'word2', 'word4']
s = set(l)
x = [s.remove(root_word) for root_word in l for item in (s-set([root_word])) if root_word in item if root_word in s ]
print(s)
I used List Comprehension technique to execute take ~20micr0sec
Upvotes: 0