plshelpme_
plshelpme_

Reputation: 57

How to remove Stopwords from CSV file using NLTK?

Trying to remove stopwords from csv file that has 3 columns and creates a new csv file with the removed stopwords. This is successful however, the data in the new file appears across the top row rather than the columns in the original file.

    import io 
    import codecs
    import csv
    from nltk.corpus import stopwords 
    from nltk.tokenize import word_tokenize 

    stop_words = set(stopwords.words('english')) 
    file1 = codecs.open('soccer.csv','r','utf-8') 
    line = file1.read() 
    words = line.split()
    for r in words: 
        if not r in stop_words: 
            appendFile = open('stopwords_soccer.csv','a', encoding='utf-8') 
            appendFile.write(" "+r)
            appendFile.close()

Upvotes: 1

Views: 4107

Answers (1)

Thalish Sajeed
Thalish Sajeed

Reputation: 1351

You need to insert a newline character after writing each line.

for r in words: 
    if not r in stop_words: 
        appendFile = open('stopwords_soccer.csv','a', encoding='utf-8') 
        appendFile.write(r)
        appendFile.write("\n")
        appendFile.close()

This should solve your issue.

Upvotes: 1

Related Questions