Reputation: 1289
I have created a .tar file on a Linux machine as follows:
tar cvf test.tar test_folder/
where the test_folder contains some files as shown below:
test_folder
|___ file1.jpg
|___ file2.jpg
|___ ...
I am unable to programmatically extract the individual files within the tar archive using Python. More specifically, I have tried the following:
import tarfile
with tarfile.open('test.tar', 'r:') as tar:
img_file = tar.extractfile('test_folder/file1.jpg')
# img_file contains the object: <ExFileObject name='test_folder/test.tar'>
Here, the img_file
does not seem to contain the requested image, but rather it contains the source .tar
file. I am not sure, where I am messing things up. Any suggestions would be really helpful. Thanks in advance.
Upvotes: 5
Views: 5812
Reputation: 14121
Appending 2 lines to your code will solve your problem:
import tarfile
with tarfile.open('test.tar', 'r:') as tar:
img_file = tar.extractfile('test_folder/file1.jpg')
# --------------------- Add this ---------------------------
with open ("img_file.jpg", "wb") as outfile:
outfile.write(img_file.read())
The explanation:
The .extractfile()
method only provided you the content of the extracted file (i.e. its data).
So you have do it yourself - by reading this returned content (img_file.read()
) and writing it into a file of your choice (outfile.write(...)
).
Or — to simplify your life — use the .extract()
method instead. See my other answer.
Upvotes: 4
Reputation: 14121
You probably wanted to use the .extract()
method instead of your .extractfile()
method (see my other answer):
import tarfile
with tarfile.open('test.tar', 'r:') as tar:
tar.extract('test_folder/file1.jpg') # .extract() instead of .extractfile()
Notes:
Your extracted file will be in the (maybe newly created) folder test_folder
under your current directory.
The .extract()
method returns None
, so there is no need to assign it (img_file = tar.extract(...)
)
Upvotes: 6
Reputation: 1
This is because extractfile() returns a io.BufferReader object, so essentially you are extracting the file in your directory and storing the io.BufferReader in your variable.
What you can do is, extract the file then open the file in a different content manager
import tarfile
with tarfile.open('test.tar', 'r:') as tar:
tar.extractfile('test_folder/file1.jpg')
with open('test_folder/file1.jpg','rb') as img:
# do something with img. Here img is your img file
Upvotes: -1