Reputation: 63
I'm trying to upload a sample pyspark dataframe to Azure blob, after converting it to excel format. Getting the below error. Also, below is the snippet of my sample code.
If there is a other way to do the same, pls let me know.
from pyspark.sql.types import StructType,StructField, StringType, IntegerType
import pandas as ps
#%pip install xlwt
#%pip install openpyxl
#%pip install fsspec
my_data = [
("A","1","M",3000),
("B","2","F",4000),
("C","3","M",4000)
]
schema = StructType([ \
StructField("firstname",StringType(),True), \
StructField("id", StringType(), True), \
StructField("gender", StringType(), True), \
StructField("salary", IntegerType(), True) \
])
df = spark.createDataFrame(data=my_data,schema=schema)
pandasDF = df.toPandas()
pandasDF.to_excel("wasbs://[email protected]/output_file.xlsx")
ValueError: Protocol not known: wasbs
Upvotes: 0
Views: 694
Reputation: 4544
You are directly using python library pandas
to write the data. This isn't work this way. You need to first mount the Azure Blob storage container and then write the data.
To mount, use following command:
dbutils.fs.mount(
source = "wasbs://<container-name>@<storage-account-name>.blob.core.windows.net",
mount_point = "/mnt/<mount-name>",
extra_configs = {"<conf-key>":dbutils.secrets.get(scope = "<scope-name>", key = "<key-name>")})
To write, use below commands:
df.write
.mode("overwrite")
.option("header", "true")
.csv("dbfs:/mnt/azurestorage/filename.csv"))
Upvotes: 1