Reputation: 19
I have a file test.tsv contains special symbol "\u202f": special symbol
When i wrote a python script to read this file, found that .readline() and read this symbol, and .read() can not read it.
And when i print lines1[0], "\u202f" disappeared.
Why ?
Code:
ff = "test.tsv"
lines1 = open(ff, encoding='utf-8').readlines()
str1 = open(ff, encoding='utf-8').read()
print("lines1:", lines1)
print("lines1[0]:", lines1[0])
print("str1:", str1)
Output:
lines1: ['assume Fourbooks\u202f è una piattaforma\n']
lines1[0]: assume Fourbooks è una piattaforma
str1: assume Fourbooks è una piattaforma
Upvotes: 1
Views: 3177
Reputation: 394
First of all both readline()
and read()
are reading your special character.
The readline()
reads each line as it is present in the file and append it to the list, on the other hand read()
reads all the content of your file and save it is as a string.
If you see your output closely, you will notice that while printing lines1
you are getting \u202f
as a text only not as a evaluated value. But when you are printing lines1[0]
and str1
, your special character is getting printed but this time it's value is getting evaluated which is a whitespace.
The actual reason behind the difference in the output is that the __repr__
function is being called (through the list, on line print(lines1)
, and in the other, the __str__
function is being called (by the str object itself, on lines print(lines1[0])
and print(str1)
) as mentioned in the comments by MZ
Upvotes: 2