Nate
Nate

Reputation: 536

DBFS FileStore Equivalent in Azure Synapse?

Is there an equivalent to Databricks' DBFS FileStore system in Azure Synapse? Is it possible to upload csv files and read them into pandas dataframes within Azure Synapse notebooks? Ideally I'd like to not load the csv into a database; looking for something as simple as DBFS' FileStore folder.

In Databricks: pd.read_csv('/dbfs/FileStore/name_of_file.csv')

In Synapse: ?

I don't see anywhere to upload csv files directly like in DBFS:

enter image description here

Upvotes: 0

Views: 579

Answers (1)

Saideep Arikontham
Saideep Arikontham

Reputation: 6104

The azure synapse equivalent of using FileStore in Databricks would be to use the data lake file system linked to your synapse workspace. Once you go to your synapse studio, navigate to Data->Linked where you can find the linked storage account. This storage account was created/assigned when you create your workspace.

enter image description here

This primary data lake functions close to the FileStore in azure Databricks. You can use the UI shown in the above image to upload required files. You can right click on any of the files and load it into a Dataframe. As you can see in the image below, you can right click on the file and then choose new notebook -> Load to DataFrame.

enter image description here

The UI automatically provides a code which helps to load the csv file to a spark Dataframe. You can modify this code to load the file as a pandas Dataframe.

'''
#this is provided by synapse when you select file and choose to load to Dataframe

df = spark.read.load('abfss://[email protected]/sample_1.csv', format='csv'
## If header exists uncomment line below
##, header=True
)
display(df.limit(10))
'''
#Use this following code to load as pandas dataframe

import pandas as pd 
df = pd.read_csv('abfss://[email protected]/sample_1.csv')

This data lake storage will be linked to the workspace with the help of the linked service (Can be viewed in Manage->Linked services). This is created by default from the data lake and file system information provided by the user (mandatory) while creating the synapse workspace.

Upvotes: 2

Related Questions