APorter1031
APorter1031

Reputation: 2256

Python Decompressing gzip csv in pandas csv reader

The following code works in Python3 but fails in Python2

r = requests.get("http://api.bitcoincharts.com/v1/csv/coinbaseUSD.csv.gz", stream=True)
decompressed_file = gzip.GzipFile(fileobj=r.raw)
data = pd.read_csv(decompressed_file, sep=',')
data.columns = ["timestamp", "price" , "volume"]  # set df col headers
return data

The error I get in Python2 is the following:

TypeError: 'int' object has no attribute '__getitem__'

The error is on the line where I set data equal to pd.read_csv(...)

Seems to be a pandas error to me

Stacktrace:

Traceback (most recent call last):
  File "fetch.py", line 51, in <module>
    print(f.get_historical())
  File "fetch.py", line 36, in get_historical
    data = pd.read_csv(f, sep=',')
  File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 709, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 449, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 818, in __init__
    self._make_engine(self.engine)
  File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 1049, in _make_engine

    self._engine = CParserWrapper(self.f, **self.options)
  File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 1695, in __init__
    self._reader = parsers.TextReader(src, **kwds)
  File "pandas/_libs/parsers.pyx", line 562, in pandas._libs.parsers.TextReader.__cinit__
  File "pandas/_libs/parsers.pyx", line 760, in pandas._libs.parsers.TextReader._get_header
  File "pandas/_libs/parsers.pyx", line 965, in pandas._libs.parsers.TextReader._tokenize_rows
  File "pandas/_libs/parsers.pyx", line 2197, in pandas._libs.parsers.raise_parser_error
io.UnsupportedOperation: seek

Upvotes: 1

Views: 1091

Answers (1)

Abdou
Abdou

Reputation: 13274

The issue from the traceback you posted is related to the fact that the Response object's raw attribute is a file-like object that does not support the .seek method that typical file objects support. However, when ingesting the file object with pd.read_csv, pandas (in python2) seems to be making use of the seek method of the provided file object.

You can confirm that the returned response's raw data is not seekable by calling r.raw.seekable(), which should normally return False.

The way to circumvent this issue may be to wrap the returned data into an io.BytesIO object as follows:

import gzip
import io
import pandas as pd
import requests

# file_url = "http://api.bitcoincharts.com/v1/csv/coinbaseUSD.csv.gz"
file_url = "http://api.bitcoincharts.com/v1/csv/aqoinEUR.csv.gz"
r = requests.get(file_url, stream=True)
dfile = gzip.GzipFile(fileobj=io.BytesIO(r.raw.read()))
data = pd.read_csv(dfile, sep=',')

print(data)

            0     1    2
0  1314964052  2.60  0.4
1  1316277154  3.75  0.5
2  1316300526  4.00  4.0
3  1316300612  3.80  1.0
4  1316300622  3.75  1.5

As you can see, I used a smaller file from the directory of files available. You can switch this to your desired file. In any case, io.BytesIO(r.raw.read()) should be seekable, and therefore should help avoid the io.UnsupportedOperation exception you are encountering.

As for the TypeError exception, it is inexistent in this snippet of code.

I hope this helps.

Upvotes: 3

Related Questions