Reputation: 6066
I am using Python 3.3.0, on windows 64bit.
I have a text file as shown below: (see bottom for download link at mediafire)
hello
-data1:blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah
-data2:blah blah blah blah blah blah blah blah blah blah blah
-data3: Empty
-data4: Empty
I'm trying to navigate around the file, and thus I use .tell()
to figure out what my position is. However, when reading through the lines of the file as shown below, I get a very strange result:
f=open("test.txt")
while True:
a = f.readline()
print("{} {}".format(repr(a),f.tell()))
if a == "":
break
The result:
'hello\n' 7
'\n' 9
'-data1:blah blah blah blah blah blah blah blah blah blah blah blah blah blah bl
ah blah\n' 18446744073709551714
'\n' 99
'\n' 101
'-data2:blah blah blah blah blah blah blah blah blah blah blah\n' 164
'-data3: Empty\n' 179
'\n' 181
'-data4: Empty' 194
'' 194
What's with the 18446744073709551714 for the 3rd line? Though it looks like an impossible value, f.seek(18446744073709551714)
is an acceptable value that apparently does bring me to the end of the 3rd line. Though, I can't seem to figure out why.
EDIT:
Opening in binary mode gives no problems with tell()
:
f=open("test.txt","rb")
while True:
a = f.readline()
print("{} {}".format(repr(a),f.tell()))
if a == b"":
break
The result:
b'hello\r\n' 7
b'\r\n' 9
b'-data1:blah blah blah blah blah blah blah blah blah blah blah blah blah blah b
lah blah\r\n' 97
b'\r\n' 99
b'\r\n' 101
b'-data2:blah blah blah blah blah blah blah blah blah blah blah\r\n' 164
b'-data3: Empty\r\n' 179
b'\r\n' 181
b'-data4: Empty' 194
b'' 194
The test.txt text file is downloadable here, just a tiny 194 bytes: http://www.mediafire.com/?1wm4lujb2j48y23
Upvotes: 18
Views: 9308
Reputation: 1
When I see numbers that large with no obvious relation to reality on a computer, the first thing I want to do is see that number in hex.
>>> val = 18446744073709551714
>>> hex(val)
'0x10000000000000062'
Interesting.
0x100000000000000 is 2 ** 64
Let's mask it out.
>>> mask = (1 << 64) - 1
>>> val & mask
98
And that is the correct value at that point. It appears, from this small example at least, that tell() uses bits above 63 for its own purposes.
Upvotes: 0
Reputation: 101929
It's a documented behaviour caused by UNIX-style line endings:
Return the file’s current position, like
stdio
'sftell()
.Note: On Windows,
tell()
can return illegal values (after anfgets()
) when reading files with Unix-style line-endings. Use binary mode ('rb') to circumvent this problem.
The above documentation is taken from the python2.7.4 documentation. The documentation for python3 changed a bit, since there is now a hierarchy of classes that handle I/O and I can't find this bit of information. Your test shows that the behaviour didn't change anyway. Also the source code for python3.3 has an XXX Windows support below is likely incomplete
comment before the function called by tell
.
There is an issue in python bug tracker related to this, and the final comment by Catalin Iacob is:
I tried to reproduce this, picked a file on my disk and indeed I got a negative number, but that file has Unix line endings. This is documented at http://docs.python.org/2/library/stdtypes.html#file.tell so probably there's nothing to do then.
As for Armin's report in msg180145, even though it's not intuitive, this matches ftell's behavior on Windows, as documented in the Remarks section of http://msdn.microsoft.com/en-us/library/0ys3hc0b%28v=vs.100%29.aspx. The tell() method on fileobjects is explicitly documented as matching ftell behavior: "Return the file’s current position, like stdio‘s ftell()". So even though it's not intuitive at all, it's probably better to leave it as is. tell() returns the intuitive non zero position when opening with 'a' on Python3 and on Python 2.7 when using io.open so it's fixed for the future anyway.
So it seems like a "wontfix" bug. Someone should probably open an issue(commented the issue) because this fact is not mentioned at all in python3 documentation.
According to Antoine Pitrou python3 doesn't use ftell()
at all, hence this seems to be a different bug. Also the bug is not reproducible in python3.2.3 and was probably introduced when fixing this issue (at least, it's the only change I can find to the implementation of tell()
between 3.2.3 and 3.3)
Last edit: According to the io
module documentation the tell
method does not return the number of bytes since the beginning of a file. The returned value is an "opaque number", which means that the only way you can use it is to pass it to seek
to get back at that position. Other operations aren't meaningful. The fact that until python3.2.3 the value returned was what you'd expect was only an implementation detail.
Note that the information in this section of the documentation is simply wrong and, hopefully, it will be fixed in the future.
Upvotes: 16