Profy
Profy

Reputation: 111

Removing element containing specific strings in a list of string

I have a programm creating list like those:

["abc a","hello","abc","hello z"]

My goal is to move in list and if the element is contained in one of the string remove the string

first iteration:

# abc a can't be found in any other element:

["abc a","hello","abc","hello z"]

second one:

# hello is present in element 4:

["abc a","hello","abc"]

third one:

# abc can be found in element one:

["hello","abc"]

I have tried using the filter() function without success

I want every element to pass in the function the only problem is that the list size is reducing therefore this is another problem i dont know how to treat

Thank you

Upvotes: 1

Views: 108

Answers (2)

Dinesh Kumar
Dinesh Kumar

Reputation: 488

What you can do is initially when you get the list make it this way [[element1,status],[element2,status]]. Here the status will be present or deleted. Initially all the status will be present and as you are traversing instead of removing/deleting the element just update the status to deleted, and in every iteration if you find a match you will only consider if it's status is present, that way your list size remains the same. And at the end pick only those elements whose status is present. Hope you get it.

init_list = ["abc a","hello","abc","hello z"]
new_list = list()
for i in init_list:
    new_list.append([i,"present"])

for i in new_list:
    if i[1] == 'present':
        for j in new_list:
            if j[1] == 'present' and not i == j:
                fin = j[0].find(i[0])
                if not fin == -1:
                    j[1] = 'deleted'

fin_list = list()
for i in new_list:
    if i[1] == 'present':
        fin_list.append(i[0])

print(fin_list)

Upvotes: 1

Jean-François Fabre
Jean-François Fabre

Reputation: 140297

one approach would be to:

  • create list of sets of words (by splitting words by spaces)
  • sort the list by smallest elements first
  • rebuild a list while making sure not to repeat words when adding new sets

like this:

lst = ["abc a","hello","abc","hello z"]

words = sorted([set(x.split()) for x in lst],key=len)

result = []
for l in words:
    if not result or all(l.isdisjoint(x) for x in result):
        result.append(l)

print(result)

prints the list of sets:

[{'hello'}, {'abc'}]

This approach loses order of the words but won't have issues with word delimiters. Substring approach would look like this:

lst = ["abc a","hello","abc","hello z"]

words = sorted(lst,key=len)

result = []
for l in words:
    if not result or all(x not in l for x in result):
        result.append(l)

print(result)

prints:

['abc', 'hello']

(this approach can be problematic with word delimiters but the all condition can be easily adapted with a split there to avoid this). Ex a condition like:

if not result or all(set(x.split()).isdisjoint(l.split()) for x in result):

would turn:

lst = ["abc a","hello","abc","abcd","hello z"]

into

['abc', 'abcd', 'hello']

Upvotes: 1

Related Questions