skp
skp

Reputation: 324

How to read mount files with pyreadstat read_xport function in Databricks?

Mounted the xpt file from Azure storage to the Databricks DBFS path. With pyreadstat read_xport getting the below error even though test.xpt file exists in the mount path folder.

Can anyone please let me know how to access the .xpt file with/without mounting with pyreadstat read_xport function?

mount_file_path = f"/dbfs/mnt/test.xpt"

df_xpt, xpt_meta = pyreadstat.read_xport(mount_file_path)

PyreadstatError: File /dbfs/mnt/test.xpt does not exist!

Upvotes: 1

Views: 461

Answers (1)

Vamsi Bitra
Vamsi Bitra

Reputation: 2729

I tried to reproduce the same in my environment and got below results

I created a sample data frame and saved the same Dataframe df into the /dbfs/demo.xpt location using the write operation.

Make sure to install pyreadstat, you can use this command for installing pyreadstat package:pip install pyreadstat.

import pandas as pd
import pyreadstat

df = pd.DataFrame([[1,2.0,"A"],[3,4.0,"B"]], columns=["k1", "k2", "k3"]
column_labels = ["Var 1", "Var 2", "Var 3"]
pyreadstat.write_xport(df, "/dbfs/demo.xpt", file_label="test", column_labels=column_labels)

Successfully access .XPT file using below command:

import pyreadstat

df, meta = pyreadstat.read_xport('/dbfs/demo.xpt', metadataonly=True)

Now you can check ,

enter image description here

Update:

If you want to copy the xpt file from Azure storage to dbfs, Please follow below code:

#Set Blob storage configuration
spark.conf.set("fs.azure.account.key.vamblob.blob.core.windows.net","<access_key>")

#Use this command to copy the xpt file from Azure storage to dbfs

dbutils.fs.cp("wasbs://[email protected]/<file_name>.xpt","dbfs:/<your_file_name>")

enter image description here

Upvotes: 1

Related Questions