Reputation: 461
I've written a short script to read from a file which contains information about articles from a blog. Each line in the file corresponds to one article, and the tab-separated columns hold information such as an article 'id', title and paragraph.
id title paragraph
1 Motorola prototypes from Frog Some cool looking concepts for phones, watches etc
2 Digital everything This new york times article talks about the willingness of consumers
3 E-mails banned at summer camps E-mails compound feelings of homesickness in kids
4 Simple Multimedia Websites/e-mail This is a sort of website/e-mail generation site
5 Campground wi-fi Wi-fi is now on the list of amenities offered at many campgrounds
6 Fog screen Literally, a screen made by projecting onto fog
This code splits the file by the '\n' so that each article is an element in a list:
# Open file and skip first line(headers)
file = open("RBArticlesTabClean.txt", "r", encoding="utf-8")
file.readline()
# Read and decode whole file
articlesFile = htmlcodes.decodeString(file.read()).lower()
# Split file into its lines
articlesFileList = articlesFile.split("\n")
To test that this is working and that the program is reading the file correctly, I iterate through the list of articles obtained, and print the whole thing out:
for each in articlesFileList:
input(each)
When running this in IDLE, it works as expected, printing out each line (in lowercase) every time the user presses the enter key.
However, when the script is run through the command prompt, it fails after printing three articles, with this error:
Traceback (most recent call last):
File "E:\Python\RBTrends\RBTrendsAnalysis.py", line 52, in <module>
print(each)
File "C:\Python34\lib\encodings\cp850.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2019' in position 89: character maps to <undefined>
I have two questions:
1) Why do I receive this error?
2) Why is there a difference between running the program in IDLE and in the command prompt?
Upvotes: 1
Views: 59
Reputation: 2295
As far as I know IDLE is capable of displaying unicode characters, while command prompt cannot do anything better than plain old ascii. That is the reason you are encountering this error.
Upvotes: 1