Reputation: 95
I'm trying to get the mime type, read and parse some archived files using the next code:
archive_file.tar.gz ---> file.csv, file.json, file.xlsx etc.
def parse_tar_gzip(element):
from my_lib import parse_file
from my_lib import NestedArchives
try:
tar = tarfile.open(fileobj=element, mode="r")
except tarfile.ReadError:
raise NestedArchives(element)
else:
for mem in tar.getmembers():
if mem.isfile():
my_mems = mem.name.split("/")[-1]
if not my_mems.startswith("."):
my_file = tar.extractfile(mem)
# my_mime = mimetypes.guess_type(my_file)
print(my_file)
# yield "", parse_file(my_file)
with open('/Users/my_name/Downloads/archive_file.tar.gz', 'rb') as my_files:
blabla = parse_tar_gzip(my_files)
print(blabla)
The problem is that my_file
is returned as ExFileObject
having the name archive_file.tar.gz
instead of the name of the files inside the archive (e.g:file.json
or file.xlsx
) as bellow:
<ExFileObject name='/Users/my_name/Downloads/archive_file.tar.gz'>
<ExFileObject name='/Users/my_name/Downloads/archive_file.tar.gz'>
<ExFileObject name='/Users/my_name/Downloads/archive_file.tar.gz'>
<ExFileObject name='/Users/my_name/Downloads/archive_file.tar.gz'>
Shouldn't extractfile
return the name of the files inside the archive? This is very strange because when I was using python2.x there were the files name...
Upvotes: 2
Views: 1004
Reputation: 155333
The ExFileObject
is constructed from the underlying file handle to the tarball, without knowing the member being extracted (it's just told the offset, size and sparseness of the member being extracted). So it doesn't know the name of the thing being extracted, it only has the name of the original tarball as shown.
Given that .name
is supposed tell you about the file system name of the open file object, it's arguably correct, if somewhat misleading, to do this; you don't have a handle to an actual file system object based on the member name, just a handle to the tarball itself. You have access to the name at the moment you call extractfile
, so just hold on to that information if you need it. The point of extractfile
is to get the data, not the name it was stored under after all.
Upvotes: 2