Reputation: 1958
I am looking how to read an xlsx file with pandas, with the file hosted on SharePoint. These contents, when shown through reponse.text, are in string but are a binary representation of the file.
PK╚╝ ! #h�╔�╔ �═ �╔[Content_Types].xml ��╔(� ╗ ��[O�@��M�;���1��G% {���.Z�E��Ҧݝ�I{��5�╗"j� ���"╚W�J�I!^_Z�"CR�R�;(� P ��g��U ̸�a!�D�FJ,�`�>�㕱�V?Ɖ ��� �n�}%K��������Pv���k'#�Dv��W� �B0�T�F��U? -?�*_�-K�"� � dM�fb|═"�BndF0x�3UΕ�Nu� ���P�lO�Y�ğ#� �����,g�K#�}�����E=�tD�U�}���O�Q�[��F�|Ix��╚���[H2{�H+╚x�k�]dn�a�╔yZ"N�jͺ�"ih�s�Gn�<j�╚
I would like to know how to read this format into memory so that I can call pd.read_excel with it.
I've tried to use urllib and openpyxl, in this manner:
import openpyxl as excel
import pandas as pd
from io import BytesIO
import urllib
req = urllib.request.Request(url=url, data=payload, headers=headers)
with urllib.request.urlopen(url=req) as reponse:
rsp = reponse.read()
excel.load_workbook(filename=rsp)
But I am getting error 400 Bad Request, from the urllib request module.
The url looks something like this:
Upvotes: 0
Views: 1613
Reputation: 1958
I found a way to do it. the key was to seek back before passing the file to pandas.
file_ext = self.file_name.split('.')[-1]
if file_ext == 'xlsx':
import pandas as pd
from io import BytesIO
xl = bytes(memoryview(response.content))
memfile =BytesIO()
memfile.write(xl)
memfile.seek(0)
df = pd.io.excel.read_excel(memfile, engine='openpyxl')
print(df.head(10))
Upvotes: 1