Reputation: 17
Within a Azure Databricks notebook, I am attempting to perform a transformation on some csv's which are in blob storage using the following:
*import os
import glob
import pandas as pd
os.chdir(r'wasbs://dalefactorystorage.blob.core.windows.net/dale')
allFiles = glob.glob("*.csv") # match your csvs
for file in allFiles:
df = pd.read_csv(file)
df = df.iloc[4:,] # read from row 4 onwards.
df.to_csv(file)
print(f"{file} has removed rows 0-3")*
Unfortunately I am getting the following error:
*FileNotFoundError: [Errno 2] No such file or directory: 'wasbs://dalefactorystorage.blob.core.windows.net/dale'
Am I missing something? (I am completely new to this).
Cheers,
Dale
Upvotes: 1
Views: 5923
Reputation: 55
A, alternative approach is to mount the dbfs file as a spark dataframe, and then just convert it from a sparkdf to a pandas df:
# mount blob storage
spark.conf.set("fs.azure.account.key.storageaccountname.blob.core.windows.net",
"storageaccesskey")
dfspark = spark.read.csv("wasbs://[email protected]
/filename.csv", header="true")
# convert from sparkdf to pandasdf
df = dfspark.toPandas()
Upvotes: -1
Reputation: 23111
If you want to use package pandas
to read CSV file from Azure blob process it and write
this CSV file to Azure blob in Azure Databricks, I suggest you mount Azure blob storage as Databricks filesystem then do that. For more details, please refer to here.
For example
dbutils.fs.mount(
source = "wasbs://<container-name>@<storage-account-name>.blob.core.windows.net",
mount_point = "/mnt/<mount-name>",
extra_configs = {"fs.azure.account.key.<storage-account-name>.blob.core.windows.net":"<account access key>"})
import os
import glob
import pandas as pd
os.chdir(r'/dbfs/mnt/<mount-name>/<>')
allFiles = glob.glob("*.csv") # match your csvs
for file in allFiles:
print(f" The old content of file {file} : ")
df= pd.read_csv(file, header=None)
print(df)
df = df.iloc[4:,]
df.to_csv(file, index=False,header=False)
print(f" The new content of file {file} : ")
df= pd.read_csv(file,header=None)
print(df)
break
Upvotes: 3