user8504877
user8504877

Reputation:

How to read Excel File from AWS in Python 3?

the script below runs without any errors in python 2 but in python 3, I get the following error.

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-8-11eb1acac6cc> in <module>
     16 response = s3client.get_object(Bucket='db_region_xyz' , Key='Aaa_bbb/Tester.xlsx')
     17 
---> 18 dataset = pd.read_excel(response['Body'])
/opt/anaconda3/lib/python3.7/site-packages/pandas/io/excel/_base.py in read_excel(io, sheet_name, header, names, index_col, usecols, squeeze, dtype, engine, converters, true_values, false_values, skiprows, nrows, na_values, keep_default_na, verbose, parse_dates, date_parser, thousands, comment, skipfooter, convert_float, mangle_dupe_cols, **kwds)
    302 
    303     if not isinstance(io, ExcelFile):
--> 304         io = ExcelFile(io, engine=engine)
    305     elif engine and engine != io.engine:
    306         raise ValueError(
/opt/anaconda3/lib/python3.7/site-packages/pandas/io/excel/_base.py in __init__(self, io, engine)
    819         self._io = stringify_path(io)
    820 
--> 821         self._reader = self._engines[engine](self._io)
    822 
    823     def __fspath__(self):
/opt/anaconda3/lib/python3.7/site-packages/pandas/io/excel/_xlrd.py in __init__(self, filepath_or_buffer)
     19         err_msg = "Install xlrd >= 1.0.0 for Excel support"
     20         import_optional_dependency("xlrd", extra=err_msg)
---> 21         super().__init__(filepath_or_buffer)
     22 
     23     @property
/opt/anaconda3/lib/python3.7/site-packages/pandas/io/excel/_base.py in __init__(self, filepath_or_buffer)
    348         elif hasattr(filepath_or_buffer, "read"):
    349             # N.B. xlrd.Book has a read attribute too
--> 350             filepath_or_buffer.seek(0)
    351             self.book = self.load_workbook(filepath_or_buffer)
    352         elif isinstance(filepath_or_buffer, str):
AttributeError: 'StreamingBody' object has no attribute 'seek'

What do I have to change in the below script that it will run in python version 3?

import boto3
from io import StringIO
from boto3 import session
import pandas as pd
import numpy as np

session = boto3.session.Session(region_name='region_xyz')
s3client = session.client('s3' , config=boto3.session.Config(signature_version='s3v4'))
response = s3client.get_object(Bucket='db_region_xyz' , Key='Aaa_bbb/Tester.xlsx')

dataset = pd.read_excel(response['Body'])

Regards

Upvotes: 0

Views: 2436

Answers (1)

Abhishek Verma
Abhishek Verma

Reputation: 1729

dataset = pd.read_excel(io.BytesIO(response['Body'].read()))

Upvotes: 1

Related Questions