Justin
Justin

Reputation: 115

Remove words in list that are present in other elements

I am trying to remove elements in a list that are present within other elements

for example:

l = ['word1', 'word1 word2', 'word3 word2', 'word2', 'word4']

the output of the function should be:

['word3 word2', 'word1 word2', 'word4']

because 'word1' and 'word2' are present in index 1 and 2 of the original list respectively

This is the algorithm I've come up with. It solves this in ~O(n^2)

l = ['word1', 'word1 word2', 'word3 word2', 'word2', 'word4']

s = set(l)

for root_word in l:
    for item in (s-set([root_word])):
        if root_word in item:
            if root_word in s:
                s.remove(root_word)

print(s) 

Does anyone know of a more optimal solution? Maybe one using python built-ins to speed things up

Upvotes: 1

Views: 133

Answers (2)

TUI lover
TUI lover

Reputation: 552

Your current solution is not O(n^2), but O(m.n^2) where m is the maximum number of words in an element of your list. You can use a frozenset of words for each entry of your list and then use the issubset method to test each of them. This will improve the performance for cases where your elements have a lot of words, but the complexity won’t change.

Upvotes: 1

Hemanth kumar
Hemanth kumar

Reputation: 131

l = ['word1', 'word1 word2', 'word3 word2', 'word2', 'word4']
s = set(l)
x = [s.remove(root_word) for root_word in l for item in (s-set([root_word])) if root_word in item if root_word in s ]   
print(s)

I used List Comprehension technique to execute take ~20micr0sec

Upvotes: 0

Related Questions