Reputation: 181
I have some text files I need to process (but not extract) from a tar archive. I have working python 2 code which I am trying to uplift to python 3. Unfortunately python 3 is returning byte strings which the rest of the code cannot process correctly. I need to convert the byte strings to strings. A simple example looks like this:
import tarfile
with tarfile.open("file.tar") as tar:
with tar.extractfile("test.txt") as extracted:
lines = extracted.readlines()
print(lines)
The result is:
['a\n', 'test\n', 'file\n'] # python 2
[b'a\n', b'test\n', b'file\n'] # python 3
Below are some current attempts at fixing, which work, however it feels awkward that I would need to use a triple with statement, list comprehension or map just to read some text:
with io.TextIOWrapper(extracted) as txtextracted:
lines = txtextracted.readlines()
# or
lines = [i.decode("utf-8") for i in lines]
# or
lines = list(map(lambda x: x.decode("utf-8"),lines))
I cannot find a neater solution in the io.BufferedReader
documentation (this is the object which TarFile.extractfile
returns). I have tried to come up with solutions but none are as neat as the python 2 solution. Is there a neat and pythonic way to parse the tar file's io.BufferedReader
object as strings?
Upvotes: 0
Views: 1355
Reputation: 61527
The with
statement allows for multiple context managers, and as it turns out, their construction may depend on previous ones in the chain - example:
class manager:
def __init__(self, name, child=None):
self.name, self.child = name, child
def __exit__(self, t, value, traceback):
print('exiting', self)
def __enter__(self):
print('entering', self)
return self
def __str__(self):
childname = None if self.child is None else f"'{self.child.name}'"
return f"manager '{self.name}' with child {childname}"
Testing it:
>>> with manager('x') as x, manager('y', x) as y, manager('z', y) as z: pass
...
entering manager 'x' with child None
entering manager 'y' with child 'x'
entering manager 'z' with child 'y'
exiting manager 'z' with child 'y'
exiting manager 'y' with child 'x'
exiting manager 'x' with child None
Thus:
with tarfile.open("file.tar") as tar, tar.extractfile("test.txt") as binary, io.TextIOWrapper(binary) as text:
lines = text.readlines()
(Although I don't think you really need to manage all those contexts anyway...)
Upvotes: 3