rodrigocf
rodrigocf

Reputation: 2099

using txt file as input for python

I have a python program that requires the user to paste texts into it to process them to the various tasks. Like this:

line=(input("Paste text here: ")).lower()

The pasted text comes from a .txt file. To avoid any issues with the code (since the text contains multiple quotation marks), the user has to do the following: type 3 quotation marks, paste the text, and type 3 quotation marls again.

Can all of the above be avoided by having python read the .txt? and if so, how?

Please let me know if the question makes sense.

Upvotes: 0

Views: 14967

Answers (2)

unutbu
unutbu

Reputation: 880807

In Python2, just use raw_input to receive input as a string. No extra quotation marks on the part of the user are necessary.

line=(raw_input("Paste text here: ")).lower()

Note that input is equivalent to

eval(raw_input(prompt))

and applying eval to user input is dangerous, since it allows the user to evaluate arbitrary Python expressions. A malicious user could delete files or even run arbitrary functions so never use input in Python2!

In Python3, input behaves like raw_input, so there your code would have been fine.

If instead you'd like the user to type the name of the file, then

filename = raw_input("Text filename: ")
with open(filename, 'r') as f:
    line = f.read()

Troubleshooting:

Ah, you are using Python3 I see. When you open a file in r mode, Python tries to decode the bytes in the file into a str. If no encoding is specified, it uses locale.getpreferredencoding(False) as the default encoding. Apparently that is not the right encoding for your file. If you know what encoding your file is using, it is best to supply it with the encoding parameter:

open(filename, 'r', encoding=...)

Alternatively, a hackish approach which is not nearly as satisfying is to ignore decoding errors:

open(filename, 'r', errors='ignore')

A third option would be to read the file as bytes:

open(filename, 'rb')

Of course, this has the obvious drawback that you'd then be dealing with bytes like \x9d rather than characters like ·.

Finally, if you'd like some help guessing the right encoding for your file, run

with open(filename, 'rb') as f:
    contents = f.read()
    print(repr(contents))

and post the output.

Upvotes: 2

SethMMorton
SethMMorton

Reputation: 48845

You can use the following:

with open("file.txt") as fl:
    file_contents = [x.rstrip() for x in fl]

This will result in the variable file_contents being a list, where each element of the list is a line of your file with the newline character stripped off the end.

If you want to iterate over each line of the file, you can do this:

with open("file.txt") as fl:
    for line in fl:
        # Do something

The rstrip() method gets rid of whitespace at the end of a string, and it is useful for getting rid of the newline character.

Upvotes: 1

Related Questions