Copy sharepoint binary files to OneLake with Pyspark

Question

I am trying to develop a general-purpose pipeline able to ingest to a OneLake Fabric folder all the files contained in a Sharepoint Online folder, without any transformation, a 1 to 1 copy of those files (in my case .xlsx files).

Since Fabric Web/Http connections cannot be parametrized (yet), I am trying to do this using a notebook written in PySpark, so I can pass all the customized parameters to the notebook from the pipeline to get a true parametric execution. Parameters like e.g.
Example of Parameter list

What I've done up to now:

Get an access token for the Sharepoint tenant for my registered app in EntraID
Connect to Sharepoint Online using the Sharepoint API and get the sharepoint folder content (using requests module)
Extract each file from that folder (using requests module) and try to write them to OneLake in binary format.

Now I am stuck in accessing OneLake with PySpark io

When I get my binary file from the REST API call, I tried this code:

response = requests.get(url, headers=headers) 
bytes_stream = io.BytesIO(response.content) 
with open(file_path, "wb") as file:
     file.write(bytes_stream.read()) 
bytes_stream.close()

but I get always the same error:

FileNotFoundError: [Errno 2] No such file or directory:

Path and filename are correct, using both absolute or relative path, and it works correctly when testing writing or reading the text files using notebookutils.

Any help is greatly appreciated.

Copy sharepoint binary files to OneLake with Pyspark

Answers (1)

Related Questions