Gunay Anach
Gunay Anach

Reputation: 1212

Read xlsx stored on sharepoint location with openpyxl in python?

quick one.

I have XLSX file located on sharepoint drive and cannot open it using openpyxl in python, it works well if it is stored on my local drive.

I tried this.

from openpyxl import load_workbook
wb = load_workbook('https://content.potatocompany.com/workspaces/PotatoTeam/Shared Documents/XYZ errors/XYZ Errors_Confirm.xlsx')

Throws up this exception:

C:\Anaconda\lib\site-packages\openpyxl\reader\excel.py in load_workbook(filename, use_iterators, keep_vba, guess_types, data_only)
    123     except (BadZipfile, RuntimeError, IOError, ValueError):
    124         e = exc_info()[1]
--> 125         raise InvalidFileException(unicode(e))
    126     wb = Workbook(guess_types=guess_types, data_only=data_only)
    127 

InvalidFileException: [Errno 22] invalid mode ('rb') or filename: 'https://...

Am I missing something? I need to read the content of one of the sheets in python.


EDIT:

Using crussell's advice, I receive 401 UNAUTHORIZED:

import requests
import urllib
from openpyxl import load_workbook
from requests.auth import HTTPBasicAuth

file = "https://content.potatocompany.com/workspaces/PotatoTeam/Shared Documents/XYZ errors/XYZ Errors_Confirm.xlsx"

username = 'PotatoUser'
password = 'PotatoPassword'

resp=requests.get(file, auth=HTTPBasicAuth(username, password))
print(resp.content)

Seems like sharepoint and requests are not compatible, with both Digest Authentication and Basic Authentication http://docs.python-requests.org/en/latest/user/authentication/

Upvotes: 11

Views: 35896

Answers (3)

crussell
crussell

Reputation: 179

Instead of trying to load directly from a web-address, try using urllib.

import urllib
file = "https://content.potatocompany.com/workspaces/PotatoTeam/Shared Documents/XYZ errors/XYZ Errors_Confirm.xlsx"
urllib.urlretrieve(file,"test.xlsx")

From further research, requests may be preferred over urllib. Try this:

import requests
from requests.auth import HTTPBasicAuth
file = "https://content.potatocompany.com/workspaces/PotatoTeam/Shared Documents/XYZ errors/XYZ Errors_Confirm.xlsx"
        
username = 'myUsername'
password = 'myPassword'
        
resp=requests.get(file, auth=HTTPBasicAuth(username, password))
output = open('test.xlsx', 'wb')
output.write(resp.content)
output.close()

To get requests installed:

pip install requests

Upvotes: 4

Ben.T
Ben.T

Reputation: 29635

If the SP is internal, it might work by removing "https:" in the name you put in the load_workbook().

from openpyxl import load_workbook
file = '//content.potatocompany.com/workspaces/PotatoTeam/Shared Documents/XYZ errors/XYZ Errors_Confirm.xlsx'
wb = load_workbook(file)

without authentication if your account at work is directly connected to SP. Otherwise, at my work, we use NTML authentification, that you can do by using HttpNtlmAuth from the library request_ntml.

Let me know if it works or if you are still interested by this issue, I could give you example with request_ntml

Upvotes: 0

Martin Evans
Martin Evans

Reputation: 46759

You probably first need to download it first rather than opening it directly. The following approach should work:

import urllib2
from openpyxl import load_workbook
import StringIO

data = urllib2.urlopen("https://content.potatocompany.com/workspaces/PotatoTeam/Shared Documents/XYZ errors/XYZ Errors_Confirm.xlsx")
xlsx = data.read()
wb = load_workbook(StringIO.StringIO(xlsx))

Python's StringIO could be used to make the downloaded data appear as a file interface.

Upvotes: 0

Related Questions