cheryl
cheryl

Reputation: 39

Remove stop words in text file without NLTK

I have 2 files: stopwords.txt and a.txt
I want to remove the stop words from file stopwords.txt in file a.txt and separated by white spaces.

How do I do it? This is what I've tried to do:

def remove_stopwords(review_words):
with open('stopwords.txt') as stopfile:
    stopwords = stopfile.read()
    list = stopwords.split()
    print(list)
    with open('a.txt') as workfile:
        read_data = workfile.read()
        data = read_data.split()
        print(data)
        for word1 in list:
            for word2 in data:
                if word1 == word2:
                    return data.remove(list)
                    print(remove_Stopwords)

Thanks in advance

Upvotes: 3

Views: 13700

Answers (2)

Simeon Ikudabo
Simeon Ikudabo

Reputation: 2190

Here is an example:

k = []
z = []
with open('stopWords.txt', 'r') as f:
   for word in f:
        word = word.split('\n')
        k.append(word[0])

with open('a.txt', 'r') as f_obj:
    for u in f_obj:
        u = u.split('\n')
        z.append(u[0])

p = [t for t in z if t not in k]
print(p)

Iterate through each word in the stop word file and attach it to a list, then iterate through each word in the other file. Perform a list comprehension and remove each word that appears in the stop word list.

Upvotes: 1

U13-Forward
U13-Forward

Reputation: 71610

a.txt:

good great bad

stopwords.txt:

good bad

Maybe:

with open('a.txt','r') as f, open('stopwords.txt','r') as f2:
   a=f.read().split();b=f2.read().split()
   print(' '.join(i for i in a if i.lower() not in (x.lower() for x in b)))

Upvotes: 0

Related Questions