Playing With BI
Playing With BI

Reputation: 421

Writing pandas dataframe to excel in dbfs azure databricks: OSError: [Errno 95] Operation not supported

I am trying to write a pandas dataframe to the local file system in azure databricks:

import pandas as pd
url = 'https://www.stats.govt.nz/assets/Uploads/Business-price-indexes/Business-price-indexes-March-2019-quarter/Download-data/business-price-indexes-march-2019-quarter-csv.csv'
data = pd.read_csv(url)
with pd.ExcelWriter(r'/dbfs/tmp/export.xlsx', engine="openpyxl") as writer:
    data.to_excel(writer)

Then I get the following error message:

OSError: [Errno 95] Operation not supported --------------------------------------------------------------------------- OSError Traceback (most recent call last) in 3 data = pd.read_csv(url) 4 with pd.ExcelWriter(r'/dbfs/tmp/export.xlsx', engine="openpyxl") as writer: ----> 5 data.to_excel(writer)

/databricks/python/lib/python3.8/site-packages/pandas/io/excel/_base.py in exit(self, exc_type, exc_value, traceback) 892 893 def exit(self, exc_type, exc_value, traceback): --> 894 self.close() 895 896 def close(self):

/databricks/python/lib/python3.8/site-packages/pandas/io/excel/_base.py in close(self) 896 def close(self): 897 """synonym for save, to make it more file-like""" --> 898 content = self.save() 899 self.handles.close() 900 return content

I read in this post some limitations for mounted file systems: Pandas: Write to Excel not working in Databricks

But if I got it right, the solution is to write to the local workspace file system, which is exactly what is not working for me.

My user is workspace admin and I am using a standard cluster with 10.4 Runtime.

I also verified I can write csv file to the same location using pd.to_csv

What could be missing.

Upvotes: 2

Views: 4215

Answers (1)

Saideep Arikontham
Saideep Arikontham

Reputation: 6104

  • Databricks has a drawback that does not allow random write operations into DBFS which is indicated in the SO thread you are referring to.

  • So, a workaround for this would be to write the file to local file system (file:/) and then move to the required location inside DBFS. You can use the following code:

import pandas as pd
url = 'https://www.stats.govt.nz/assets/Uploads/Business-price-indexes/Business-price-indexes-March-2019-quarter/Download-data/business-price-indexes-march-2019-quarter-csv.csv'
data = pd.read_csv(url)

with pd.ExcelWriter(r'export.xlsx', engine="openpyxl") as writer:
    #file will be written to /databricks/driver/ i.e., local file system
    data.to_excel(writer)
  • dbutils.fs.ls("/databricks/driver/") indicates that the path you want to use to list the files is dbfs:/databricks/driver/ (absolute path) which does not exist.

  • /databricks/driver/ belongs to the local file system (DBFS is a part of this). The absolute path of /databricks/driver/ is file:/databricks/driver/. You can list the contents of this path by using either of the following:

import os
print(os.listdir("/databricks/driver/")

#OR

dbutils.fs.ls("file:/databricks/driver/")

enter image description here

  • So, use the file located in this path and move (or copy) it to your destination using shutil library as the following:
from shutil import move
move('/databricks/driver/export.xlsx','/dbfs/tmp/export.xlsx')

Upvotes: 3

Related Questions