Reputation: 11
I have created a Python program that removes words from a list if they are not a certain length. I have set up a for
loop that cycles through my list and checks if each word is a length of 3 or greater. My code is as follows:
import string
text_file = open("ten-thousand-english-words.txt", "r")
lines = text_file.readlines()
text_file.close()
open('SortedWords.txt', 'w').close()
for i in lines:
print(len(i))
if len(i) >= 4:
sortedFile = open("SortedWords.txt", "a") # append mode
sortedFile.write(i)
sortedFile.close()
I wanted to create a new file that only copies the word over if it is 3 characters or longer.
For some reason it reads all the words in the list as 1 character longer than they actually are (e.g. the word “Hello” would return a length of 6 even though the number of letters is 5).
I fixed this by making it so that the length it looks for is 4 instead of 3, and it worked properly. I couldn't find any information about this issue online, so I decided to post this in case anyone knows why this happens.
Upvotes: 0
Views: 1791
Reputation: 14131
Your program may be as simple as
with open("ten-thousand-english-words.txt", "r") as lines:
with open("SortedWords.txt", "w") as sortedFile:
for line in lines:
if len(line) >= 4:
sortedFile.write(line)
The analysis of your program and the explanation of the mine one:
Other people explained you why the lengths are longer, so instead of >= 3
you correctly used >= 4
.
Your import string
is useless.
Your command
open('SortedWords.txt', 'w').close()
is useless (simply remove it and make the changes which I describe below), because it opens the file and immediately closes it, effectively
Once more, it is useless.
It seems that the only reason for doing it is your later command for repeatedly opening an empty file in the append mode:
if len(i) >= 4:
sortedFile = open("SortedWords.txt", "a") # append mode
sortedFile.write(i)
But:
Opening an already opened file does nothing.
Why open / close a file repeatedly? You may simply open it in the write mode and then write to it in your loop:
sortedFile = open("SortedWords.txt", "w") # NO append mode - WRITE mode
if len(i) >= 5:
sortedFile.write(i)
Instead of manually closing an opened file, use the so-called context manager (i.e. the with
statement) for automatically closing it:
with open(...) as f:
...
...
# after dedent the file is automatically closed
To make the program more bullet-proof, remove eventual whitespaces before / after words (including the \n
) using the .strip()
method.
In this case
>= 3
comparison,\n
symbol when writing a word (i.e. the line) to the file:with open("ten-thousand-english-words.txt", "r") as lines:
with open("SortedWords.txt", "w") as sortedFile:
for line in lines:
line = line.strip()
if len(line) >= 3:
sortedFile.write(line + "\n")
Upvotes: 0
Reputation: 4033
Each line in a file has a "\n" at the end of it which indicates a newline. We can't see this character with a text editor, since the text editor automatically converts it to a new line, but rest assured it's there. When you read a file in python using readlines()
, this "\n" character is preserved. This is why you are getting a length of 1 more than expected.
Here's some code to understand what's going on:
somefile.txt
apple
banana
cow
script.py
with open("somefile.txt") as fi:
for line in fi.readlines():
print(repr(line))
>>> 'apple\n'
>>> 'banana\n'
>>> 'cow\n'
The repr
function in python will print the literal representation of the string (ie it won't write a newline when it sees "\n", it will just print it as is). If we didn't use repr
before printing, our output would be:
apple
banana
cow
Notice there are extra lines in between each string since python is printing the 2 newline characters: 1 from the string itself, and 1 which is added to the end by default from the print
function.
To get rid of the new line character, we can use my_string.strip()
, which will removing any trailing or leading whitespace:
with open("somefile.txt") as fi:
for line in fi.readlines():
print(repr(line.strip()))
>>> 'apple'
>>> 'banana'
>>> 'cow'
Upvotes: 3