Reputation: 518
Trying to read tarfile from a URL
Mostly this scraping data from a website. Even tried using gzip to open the file but it produces similar the same error. Please suggest a solution for this.
import tarfile
from io import BytesIO
import urllib.request as urllib2
rt = urllib2.urlopen("https://opentender.eu/data/files/CY_ocds_data.json.tar.gz").read()
csvzip = tarfile.open(BytesIO(rt),mode='r:gz')
This is producing type error
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-23-2ed9e3f5bdd6> in <module>()
4 import urllib.request as urllib2
5 rt = urllib2.urlopen("https://opentender.eu/data/files/CY_ocds_data.json.tar.gz").read()
----> 6 csvzip = tarfile.open(BytesIO(rt),mode='r:gz')
7 # csvzip.printdir()
2 frames
/usr/lib/python3.7/gzip.py in __init__(self, filename, mode, compresslevel, fileobj, mtime)
166 mode += 'b'
167 if fileobj is None:
--> 168 fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
169 if filename is None:
170 filename = getattr(fileobj, 'name', '')
TypeError: expected str, bytes or os.PathLike object, not _io.BytesIO
Upvotes: 1
Views: 1357
Reputation: 11
Sharing an alternative that supports Basic authentication:
import tarfile
from io import BytesIO
import requests
from requests.adapters import HTTPAdapter, Retry
def session_get(url, user, passw):
session = requests.session()
retries = Retry(total=5, backoff_factor=0.5, status_forcelist=[403, 502, 503, 504])
session.mount('https://', HTTPAdapter(max_retries=retries))
response = session.get(url, auth=(user, passw))
if response.status_code != 200:
raise Exception(log(f'Unable to access link:\n{url}\nError code: {response.status_code}'))
return response
url = "https://opentender.eu/data/files/CY_ocds_data.json.tar.gz"
resp = session_get(url, '<username>', '<password>')
tf = tarfile.open(fileobj=BytesIO(resp.content), mode="r:gz")
tf.extractfile('<filename>').read()
Upvotes: 0
Reputation: 367
You have to call tarfile.open
with the fileobj
keyword argument:
csvzip = tarfile.open(fileobj=BytesIO(rt),mode='r:gz')
Upvotes: 0
Reputation: 1303
Maybe this would be better:
import tarfile
import urllib.request as urllib2
rt = urllib2.urlopen("https://opentender.eu/data/files/CY_ocds_data.json.tar.gz")
csvzip = tarfile.open(fileobj=rt,mode='r:gz')
The urlopen function returns a file object and you pass it to tarfile.open.
Bobby
Upvotes: 1