StupendousEnzio
StupendousEnzio

Reputation: 63

Read multiple json files from blob storage to dataframe using pyspark in databricks

I am trying to get all the json files stored in a single container in a subfolder in blob storage. I have setup the environment in databricks and have the connection linked. Currently I am using this code

df = spark.read.json("wasbs://container_name@blob_storage_account.blob.core.windows.net/sub_folder/*.json")

but I am getting just the first file and not all the json files present in the subfolder even after including the wildcard /*.json.

I am trying to get all the files from the subfolder in a single dataframe and store as a table in sql database.

Can someone assist on what I am missing.

Upvotes: 1

Views: 4204

Answers (1)

RamaraoAdapa
RamaraoAdapa

Reputation: 3119

I have tested in my environment.

I have 3 json blob files inside the subfolder of my container in storage account. I am able to read all the blob json files in a single data frame

enter image description here

You can use the below code to display all json the files from the subfolder in a single data frame

df = spark.read.json("wasbs://container_name@blob_storage_account.blob.core.windows.net/sub_folder/*.json")
df.show()

enter image description here

Upvotes: 1

Related Questions