Reputation: 538
Consider this very simple example.
import codecs
from io import BytesIO
string = b"""# test comment
Some line without comment
# another comment
"""
reader = codecs.getreader("UTF-8")
stream = reader(BytesIO(string))
lines = []
while True:
# get current position
position = stream.tell()
# read first character
char = stream.read(1)
# return cursor to start
stream.seek(position, 0)
# end of stream
if char == "":
break
# line is not comment
if char != "#":
lines.append(stream.readline())
continue
# line is comment. Skip it.
stream.readline()
print(lines)
assert lines == ["Some line without comment\n"]
I am trying to read line by line from StreamReader and if the line starts with #
I skip it otherwise I store it in a list. But there is some strange behaviour when I use seek()
method. It seems like seek()
and readline()
don't cooperate and move cursor somewhere far away. The result list is empty.
Of course I could do it in different way. But as I wrote above this is a very simple example and it helps me understand how things work together.
I use Python 3.5.
Upvotes: 1
Views: 2369
Reputation: 901
Your code will work if you simply swap
reader = codecs.getreader("UTF-8")
stream = reader(BytesIO(string))
with
stream = BytesIO(string)
EDIT: If you want to use StreamReader, you can get rid of the repositioning with tell()
, as stream.read()
and stream.readline()
are sufficient for repositioning. In other words, with your current code you are repositioning twice.
The changed code in the loop:
# read first character
char = stream.read(1)
# end of stream
if char == "":
break
# line is not comment
if char != "#":
lines.append(char + stream.readline())
continue
# line is comment. Skip it.
stream.readline()
Note the change to lines.append()
Upvotes: 1
Reputation: 1121814
You don't want to use codecs
stream readers. They are an older, outdated attempt at implementing layered I/O to handled encoding and decoding of text, since superseded by the io
module, a much more robust and faster implementation. There have been serious calls for the stream readers to be deprecated.
You really want to replace your use of codecs.getreader()
with the io.TextIOWrapper()
object:
import codecs
from io import BytesIO, TextIOWrapper
string = b"""# test comment
Some line without comment
# another comment
"""
stream = TextIOWrapper(BytesIO(string))
at which point the while
loop works and lines
ends up as ['Some line without comment\n']
.
You also don't need to use seeking or tell()
here. You could just loop directly over a file object (including a TextIOWrapper()
object):
lines = []
for line in stream:
if not line.startswith('#'):
lines.append(line)
or even:
lines = [l for l in stream if not l.startswith('#')]
If you are concerned about the TextIOWrapper()
wrapper object closing the underlying stream when you no longer need the wrapper, just detach the wrapper first:
stream.detach()
Upvotes: 6