Reputation: 536
Is there an equivalent to Databricks' DBFS FileStore system in Azure Synapse? Is it possible to upload csv files and read them into pandas dataframes within Azure Synapse notebooks? Ideally I'd like to not load the csv into a database; looking for something as simple as DBFS' FileStore folder.
In Databricks: pd.read_csv('/dbfs/FileStore/name_of_file.csv')
In Synapse: ?
I don't see anywhere to upload csv files directly like in DBFS:
Upvotes: 0
Views: 579
Reputation: 6104
The azure synapse equivalent of using FileStore
in Databricks would be to use the data lake file system linked to your synapse workspace. Once you go to your synapse studio, navigate to Data->Linked
where you can find the linked storage account. This storage account was created/assigned when you create your workspace.
This primary data lake functions close to the FileStore in azure Databricks. You can use the UI shown in the above image to upload required files. You can right click on any of the files and load it into a Dataframe. As you can see in the image below, you can right click on the file and then choose new notebook -> Load to DataFrame
.
The UI automatically provides a code which helps to load the csv file to a spark Dataframe. You can modify this code to load the file as a pandas Dataframe.
'''
#this is provided by synapse when you select file and choose to load to Dataframe
df = spark.read.load('abfss://[email protected]/sample_1.csv', format='csv'
## If header exists uncomment line below
##, header=True
)
display(df.limit(10))
'''
#Use this following code to load as pandas dataframe
import pandas as pd
df = pd.read_csv('abfss://[email protected]/sample_1.csv')
This data lake storage will be linked to the workspace with the help of the linked service (Can be viewed in Manage->Linked services
). This is created by default from the data lake and file system information provided by the user (mandatory) while creating the synapse workspace.
Upvotes: 2