Sam Willey
Sam Willey

Reputation: 11

Why does the function "len()" return an answer that is 1 character longer than the actual string?

I have created a Python program that removes words from a list if they are not a certain length. I have set up a for loop that cycles through my list and checks if each word is a length of 3 or greater. My code is as follows:

import string

text_file = open("ten-thousand-english-words.txt", "r")
lines = text_file.readlines()
text_file.close()

open('SortedWords.txt', 'w').close()
for i in lines:
    print(len(i))
    if len(i) >= 4:
        sortedFile = open("SortedWords.txt", "a")  # append mode
        sortedFile.write(i)
sortedFile.close()

I wanted to create a new file that only copies the word over if it is 3 characters or longer.

For some reason it reads all the words in the list as 1 character longer than they actually are (e.g. the word “Hello” would return a length of 6 even though the number of letters is 5).

I fixed this by making it so that the length it looks for is 4 instead of 3, and it worked properly. I couldn't find any information about this issue online, so I decided to post this in case anyone knows why this happens.

Upvotes: 0

Views: 1791

Answers (3)

MarianD
MarianD

Reputation: 14131

Your program may be as simple as

with open("ten-thousand-english-words.txt", "r") as lines:
    with open("SortedWords.txt", "w") as sortedFile:
        for line in lines:
            if len(line) >= 4:
                sortedFile.write(line)

The analysis of your program and the explanation of the mine one:

  1. Other people explained you why the lengths are longer, so instead of >= 3 you correctly used >= 4.

  2. Your import string is useless.

  3. Your command

    open('SortedWords.txt', 'w').close()
    

    is useless (simply remove it and make the changes which I describe below), because it opens the file and immediately closes it, effectively

    • creating an empty file, if it doesn't already exist,
    • empties its content, if it existed.

    Once more, it is useless.

    It seems that the only reason for doing it is your later command for repeatedly opening an empty file in the append mode:

    if len(i) >= 4:
        sortedFile = open("SortedWords.txt", "a")     # append mode
        sortedFile.write(i)                 
    

    But:

    • Opening an already opened file does nothing.

    • Why open / close a file repeatedly? You may simply open it in the write mode and then write to it in your loop:

      sortedFile = open("SortedWords.txt", "w")     # NO append mode - WRITE mode
      if len(i) >= 5:
          sortedFile.write(i)
      
  4. Instead of manually closing an opened file, use the so-called context manager (i.e. the with statement) for automatically closing it:

    with open(...) as f:
        ...
        ...
    
    # after dedent the file is automatically closed
    
  5. To make the program more bullet-proof, remove eventual whitespaces before / after words (including the \n) using the .strip() method.

    In this case

    • use the >= 3 comparison,
    • add the \n symbol when writing a word (i.e. the line) to the file:
    with open("ten-thousand-english-words.txt", "r") as lines:
        with open("SortedWords.txt", "w") as sortedFile:
            for line in lines:
                line = line.strip()
                if len(line) >= 3:
                    sortedFile.write(line + "\n")
    

Upvotes: 0

Yehonatan harmatz
Yehonatan harmatz

Reputation: 449

Every line ends with '\n', and that's the '+1' you see

Upvotes: 0

Jay Mody
Jay Mody

Reputation: 4033

Each line in a file has a "\n" at the end of it which indicates a newline. We can't see this character with a text editor, since the text editor automatically converts it to a new line, but rest assured it's there. When you read a file in python using readlines(), this "\n" character is preserved. This is why you are getting a length of 1 more than expected.

Here's some code to understand what's going on:

somefile.txt

apple
banana
cow

script.py

with open("somefile.txt") as fi:
    for line in fi.readlines():
        print(repr(line))
>>> 'apple\n'
>>> 'banana\n'
>>> 'cow\n'

The repr function in python will print the literal representation of the string (ie it won't write a newline when it sees "\n", it will just print it as is). If we didn't use repr before printing, our output would be:

apple

banana

cow

Notice there are extra lines in between each string since python is printing the 2 newline characters: 1 from the string itself, and 1 which is added to the end by default from the print function.

To get rid of the new line character, we can use my_string.strip(), which will removing any trailing or leading whitespace:

with open("somefile.txt") as fi:
    for line in fi.readlines():
        print(repr(line.strip()))
>>> 'apple'
>>> 'banana'
>>> 'cow'

Upvotes: 3

Related Questions