Joel Cochran
Joel Cochran

Reputation: 7728

How to read file path values as columns in Spark?

I'm working in Azure Synapse Notebooks and reading reading file(s) into a Dataframe from a well-formed folder path like so:

enter image description here

Given there are many folders references by that wildcard, how do I capture the "State" value as a column in the resulting Dataframe?

Upvotes: 1

Views: 8167

Answers (2)

Steven
Steven

Reputation: 15258

No need to use the wildcard *.
try : df = spark.read.load("abfss://....dfs.core.windows.net/")

Spark can read partitionned folders directly, and df should then contains the column state with its different values.

Upvotes: 0

vladsiv
vladsiv

Reputation: 2936

Use input_file_name function to get the full input path and then apply regexp_extract to extract the part that you want.

Example:

df.withColumn("filepath", F.input_file_name())
df.withColum("filepath", F.regexp_extract("filepath", "State=(.+)\.snappy\.parquet", 1)

Upvotes: 3

Related Questions