gilbo184
gilbo184

Reputation: 89

How to compare a word in one list to another word in a list

I have this list here:

list1

 ['mississippi', 'well', 'worth', 'reading', 'not', 'commonplace', 'river', 'contrary', 'ways', 'remarkable', 'considering', 'missouri', 'main', 'branch', 'longest', 'river', 'world--four', 'miles', 'seems', 'safe', 'say', 'also', 'crookedest', 'river', 'since', 'in', 'one', 'part', 'journey', 'uses', 'one', 'three', 'miles', 'cover', 'same', 'ground', 'crow', 'fly', 'in', 'six', 'seventy-five', 'it', 'discharges', 'three', 'water', 'st', 'lawrence', 'twenty-five', 'as', 'as', 'rhine', 'three', 'thirty-eight', 'times', 'as', 'much', 'as', 'the', 'thames', 'other', 'river', 'so', 'vast', 'drainage-basin:', 'it', 'draws', 'its', 'water', 'supply', 'twenty-eight', 'states', 'territories', 'delaware', 'the', 'atlantic', 'seaboard', 'the', 'country', 'idaho', 'on', 'the', 'pacific', 'slope--a', 'spread', 'forty-five', 'degrees', 'longitude', 'the', 'mississippi', 'receives', 'carries', 'the', 'gulf', 'water', 'from', 'fifty-four', 'subordinate', 'rivers', 'are', 'navigable', 'steamboats', 'from', 'hundreds', 'that', 'are', 'navigable', 'flats', 'and', 'keels', 'the', 'area', 'its', 'drainage-basin', 'is', 'as', 'as', 'the', 'combined', 'areas', 'england', 'wales', 'scotland', 'ireland', 'france', 'spain', 'portugal', 'germany', 'austria', 'italy', 'and', 'turkey', 'and', 'almost', 'all', 'this', 'wide', 'region', 'is', 'fertile', 'the', 'mississippi', 'valley', 'proper', 'is', 'exceptionally', 'so']

I also have another list of common words here:

list2

['a', 'about', 'after', 'again', 'against', 'ago', 'all', 'along', 'also', 'always', 'an', 'and', 'another', 'any', 'are', 'around', 'as', 'at', 'away', 'back', 'be', 'because', 'been', 'before', 'began', 'being', 'between', 'both', 'but', 'by', 'came', 'can', 'come', 'could', 'course', 'day', 'days', 'did', 'do', 'down', 'each', 'end', 'even', 'ever', 'every', 'first', 'for', 'four', 'from', 'get', 'give', 'go', 'going', 'good', 'got', 'great', 'had', 'half', 'has', 'have', 'he', 'head', 'her', 'here', 'him', 'his', 'house', 'how', 'hundred', 'i', 'if', 'in', 'into', 'is', 'it', 'its', 'just', 'know', 'last', 'left', 'life', 'like', 'little', 'long', 'look', 'made', 'make', 'man', 'many', 'may', 'me', 'men', 'might', 'miles', 'more', 'most', 'mr', 'much', 'must', 'my', 'never', 'new', 'next', 'no', 'not', 'nothing', 'now', 'of', 'off', 'old', 'on', 'once', 'one', 'only', 'or', 'other', 'our', 'out', 'over', 'own', 'people', 'pilot', 'place', 'put', 'right', 'said', 'same', 'saw', 'say', 'says', 'see', 'seen', 'she', 'should', 'since', 'so', 'some', 'state', 'still', 'such', 'take', 'tell', 'than', 'that', 'the', 'their', 'them', 'then', 'there', 'these', 'they', 'thing', 'think', 'this', 'those', 'thousand', 'three', 'through', 'time', 'times', 'to', 'told', 'too', 'took', 'two', 'under', 'up', 'upon', 'us', 'use', 'used', 'very', 'want', 'was', 'way', 'we', 'well', 'went', 'were', 'what', 'when', 'where', 'which', 'while', 'who', 'will', 'with', 'without', 'work', 'world', 'would', 'year', 'years', 'yes', 'yet', 'you', 'young', 'your']

What I want to do is for every word in list1, if the word equals a word from list2 delete that word from list1.

This is how I tried to tackle it:

for w in text1:
    for j in text2:
        if text[w] == text2[j]:
        text.remove[w]
    print(text)

error message

text[w] == text2[j] must be an integer or a slice, not str

The objective is to try and remove the common words from the first list but comparing list2 with it. This could be the wrong way.

Thanks.

Upvotes: 1

Views: 906

Answers (5)

vash_the_stampede
vash_the_stampede

Reputation: 4606

All other methods seem to have been touched on, another one that would work here would be to use the traditional filter not filterfalse. You could filter list1 for all the elements that are not in list2, what will happen is all words that appear in list2 will not be included in the filtered result

list3 = list(filter(lambda x: x not in list2, list1))

Upvotes: 0

Karn Kumar
Karn Kumar

Reputation: 8816

You should consider using set() that is what it is made for..

Your list1

>>> lst1 =  ['mississippi', 'well', 'worth', 'reading', 'not', 'commonplace', 'river', 'contrary', 'ways', 'remarkable', 'considering', 'missouri', 'main', 'branch', 'longest', 'river', 'world--four', 'miles', 'seems', 'safe', 'say', 'also', 'crookedest', 'river', 'since', 'in', 'one', 'part', 'journey', 'uses', 'one', 'three', 'miles', 'cover', 'same', 'ground', 'crow', 'fly', 'in', 'six', 'seventy-five', 'it', 'discharges', 'three', 'water', 'st', 'lawrence', 'twenty-five', 'as', 'as', 'rhine', 'three', 'thirty-eight', 'times', 'as', 'much', 'as', 'the', 'thames', 'other', 'river', 'so', 'vast', 'drainage-basin:', 'it', 'draws', 'its', 'water', 'supply', 'twenty-eight', 'states', 'territories', 'delaware', 'the', 'atlantic', 'seaboard', 'the', 'country', 'idaho', 'on', 'the', 'pacific', 'slope--a', 'spread', 'forty-five', 'degrees', 'longitude', 'the', 'mississippi', 'receives', 'carries', 'the', 'gulf', 'water', 'from', 'fifty-four', 'subordinate', 'rivers', 'are', 'navigable', 'steamboats', 'from', 'hundreds', 'that', 'are', 'navigable', 'flats', 'and', 'keels', 'the', 'area', 'its', 'drainage-basin', 'is', 'as', 'as', 'the', 'combined', 'areas', 'england', 'wales', 'scotland', 'ireland', 'france', 'spain', 'portugal', 'germany', 'austria', 'italy', 'and', 'turkey', 'and', 'almost', 'all', 'this', 'wide', 'region', 'is', 'fertile', 'the', 'mississippi', 'valley', 'proper', 'is', 'exceptionally', 'so']

Your list2

>>> lst2 = ['a', 'about', 'after', 'again', 'against', 'ago', 'all', 'along', 'also', 'always', 'an', 'and', 'another', 'any', 'are', 'around', 'as', 'at', 'away', 'back', 'be', 'because', 'been', 'before', 'began', 'being', 'between', 'both', 'but', 'by', 'came', 'can', 'come', 'could', 'course', 'day', 'days', 'did', 'do', 'down', 'each', 'end', 'even', 'ever', 'every', 'first', 'for', 'four', 'from', 'get', 'give', 'go', 'going', 'good', 'got', 'great', 'had', 'half', 'has', 'have', 'he', 'head', 'her', 'here', 'him', 'his', 'house', 'how', 'hundred', 'i', 'if', 'in', 'into', 'is', 'it', 'its', 'just', 'know', 'last', 'left', 'life', 'like', 'little', 'long', 'look', 'made', 'make', 'man', 'many', 'may', 'me', 'men', 'might', 'miles', 'more', 'most', 'mr', 'much', 'must', 'my', 'never', 'new', 'next', 'no', 'not', 'nothing', 'now', 'of', 'off', 'old', 'on', 'once', 'one', 'only', 'or', 'other', 'our', 'out', 'over', 'own', 'people', 'pilot', 'place', 'put', 'right', 'said', 'same', 'saw', 'say', 'says', 'see', 'seen', 'she', 'should', 'since', 'so', 'some', 'state', 'still', 'such', 'take', 'tell', 'than', 'that', 'the', 'their', 'them', 'then', 'there', 'these', 'they', 'thing', 'think', 'this', 'those', 'thousand', 'three', 'through', 'time', 'times', 'to', 'told', 'too', 'took', 'two', 'under', 'up', 'upon', 'us', 'use', 'used', 'very', 'want', 'was', 'way', 'we', 'well', 'went', 'were', 'what', 'when', 'where', 'which', 'while', 'who', 'will', 'with', 'without', 'work', 'world', 'would', 'year', 'years', 'yes', 'yet', 'you', 'young', 'your']

List comparison:

>>> newlst = set(lst1) - set(lst2)
>>> newlst
{'uses', 'territories', 'area', 'longitude', 'twenty-eight', 'flats', 'crookedest', 'longest', 'country', 'cover', 'degrees', 'crow', 'six', 'ireland', 'missouri', 'combined', 'fertile', 'st', 'branch', 'commonplace', 'receives', 'draws', 'navigable', 'twenty-five', 'journey', 'pacific', 'carries', 'thirty-eight', 'keels', 'rhine', 'delaware', 'italy', 'thames', 'areas', 'exceptionally', 'england', 'spain', 'valley', 'seaboard', 'drainage-basin', 'seventy-five', 'water', 'almost', 'ways', 'atlantic', 'discharges', 'considering', 'slope--a', 'hundreds', 'part', 'supply', 'lawrence', 'france', 'region', 'safe', 'remarkable', 'vast', 'austria', 'forty-five', 'portugal', 'spread', 'states', 'worth', 'mississippi', 'idaho', 'fly', 'steamboats', 'seems', 'wide', 'scotland', 'germany', 'contrary', 'river', 'ground', 'wales', 'drainage-basin:', 'proper', 'reading', 'rivers', 'fifty-four', 'subordinate', 'turkey', 'world--four', 'gulf', 'main'}

OR Simply use:

>>> set(lst1).difference(lst2)

Note : Just be cautious as sets will not preserve order if that is important

Upvotes: 1

kmario23
kmario23

Reputation: 61305

If the items in the lists are unique and if you also don't care about the order, then you can use set

set(list1) - set(list2)

This would returns the elements from list1 that are not in list2

Upvotes: 0

Austin
Austin

Reputation: 26039

Use a list-comprehension:

lst1 = [x for x in lst1 if x not in lst2]

Take items from lst1 that are not in lst2 and that makes it simple and concise.

Evaluating your code

It's not advised to remove items from list while iterating over, as it would behave differently from what's expected.

Also, Python for is like a foreach loop, so when you do for w in text1:, you are taking items out of text1. In this context, text1[w] throws a TypeError as list indices must be integers or slices not str. Basically, you need just w there.

Upvotes: 3

U13-Forward
U13-Forward

Reputation: 71560

Or filterflase:

print(list(filterfalse(list2.__contains__,list1)))

Demo:

list1=['a','b']
list2=['a']
from itertools import filterfalse
print(list(filterfalse(list2.__contains__,list1)))

Output:

['b']

Upvotes: 0

Related Questions