eljiwo
eljiwo

Reputation: 846

Most optimal method to check length of a parquet table in dbfs with pyspark?

I have a table on dbfs I can read with pyspark, but I only need to know the length of it (nrows). I know I could just read the file and do a table.count() to get it, but that would take some time.

Is there a better way to solve this?

Upvotes: 1

Views: 877

Answers (1)

YFl
YFl

Reputation: 1429

I am afraid not.

Since you are using dbfs, I suppose you are using Delta format with Databricks. So, theoretically, you could check the metastore, but:

The metastore is not the source of truth about the latest information of a Delta table

https://docs.delta.io/latest/delta-batch.html#control-data-location

Upvotes: 2

Related Questions