Reputation: 73
I've multiple XLSX files in the azure blob container which I should load those xlsx files snowflake table.
Note: I don't want to perform the conversion operation of xlsx to csv which actually consumes more time. Also I did used the ADF Copy Activity for loading but it is taking more time to load the data.
so is there any way I can use the snowflake snowpark to load the xlsx to the snowflake table?
Additionally, I welcome any other feasible alternatives for this issue through which I can load the xlsx data quickly into the snowflake
Upvotes: 0
Views: 1096
Reputation: 1298
Additionally, I welcome any other feasible alternatives for this issue through which I can load the xlsx data quickly into the snowflake
You can try databricks with pandas dataframe here. First Mount the Blob Storage to databricks and read the excel file using openpyxl
. Write the pandas dataframe to snowflake as suggested by @Sergiu.
import pandas as pd
storage_account_name = "storage account name"
container_name = "container name"
mount_point = "/mnt/mount point"
dbutils.fs.mount(
source=f"wasbs://{container_name}@{storage_account_name}.blob.core.windows.net",
mount_point=mount_point,
extra_configs={
f"fs.azure.account.key.{storage_account_name}.blob.core.windows.net": "storage account access key"
}
)
install
openpyxl
if you don't have.
!pip install openpyxl
Reading excel file after mounting:
file_path = f"/dbfs/{mount_point}/myexcel.xlsx"
df = pd.read_excel(file_path,engine="openpyxl")
After reading it as pandas dataframe, follow this procedure by @stephenallwright to write this dataframe to snowflake table.
Upvotes: 0