sanjayr
sanjayr

Reputation: 1939

Pyspark: Reading in parquet files -- check total number of files and size of files?

I am reading in a directory of parquet files for my input data.

Is there a way to count the total number of files read in to the dataframe, as well as getting the size of the files?

I am on Spark 2.4.4

Upvotes: 0

Views: 1199

Answers (1)

jayrythium
jayrythium

Reputation: 767

df.withColumn('input_file', input_file_name()) #will give you file name

df.count() #to get the number of files read in

Upvotes: 1

Related Questions