Reputation: 1939
I am reading in a directory of parquet files for my input data.
Is there a way to count the total number of files read in to the dataframe, as well as getting the size of the files?
I am on Spark 2.4.4
Upvotes: 0
Views: 1199
Reputation: 767
df.withColumn('input_file', input_file_name()) #will give you file name
df.count() #to get the number of files read in
Upvotes: 1