RJames
RJames

Reputation: 117

Text being written to single line when removing stopwords from columns

I'm trying to remove stopwords from a tab-delimited .txt file using the following code:

import io
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize


file = open('textposts_01.txt', encoding='UTF-8')
stop_words = set(stopwords.words('english'))
line = file.read()
words = line.split()
for r in words:
    if not r in stop_words:
        appendFile = open('textposts_02.txt', mode='a', encoding='UTF-8')
        appendFile.write(" "+r)
        appendFile.close()

The code executes successfully, but when I view the results all of rows have been re-written onto a single line. How can I maintain the columns while removing the stopwords?

I found the following solution on a similar post:

import io
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

file = open('textposts_01.txt', encoding='UTF-8')
stop_words = set(stopwords.words('english'))
line = file.read()
words = line.split()
for r in words:
    if not r in stop_words:
        appendFile = open('textposts_02.txt', mode='a', encoding='UTF-8')
        appendFile.write(" "+r)
        appendFile.write("\n")
        appendFile.close()

But inserting a new line simply created a new line after every word so that if I started with a row like this:

0     make a list of every person you know

the results looked like this:

0
make
list
every
person
know

and I need the results in rows like so:

0     make list every person

I've been searching a while, but haven't found any solutions.

Upvotes: 0

Views: 54

Answers (2)

Sam Chats
Sam Chats

Reputation: 2311

You can loop over the file and add a newline once you're done with each line.

Also, among other things, reading all of the file at once is not a very memory-friendly approach. Following is a better and safer approach:

stop_words = set(stopwords.words('english'))
with open('textposts_01.txt', encoding='UTF-8') as f:
    with open('textposts_02.txt', mode='a', encoding='UTF-8') as append_file:
        for line in f:
            for r in line.split():
                if r not in stop_words:
                    append_file.write(" "+r)
            append_file.write("\n")

Upvotes: 1

tripleee
tripleee

Reputation: 189936

appendFile.write(" "+r)

will simply write the line without a newline. You probably want

appendFile.write(r + "\n")

instead.

Upvotes: 2

Related Questions