DIP
DIP

Reputation: 1

Difference between the count in a dataframe and tempview that created from that dataframe

Step 1 I have one dataframe created from a delta table.

Df= spark.read.format(delta).load(path)

Step2 I am creating a temp view from that dataframe

Df.createorreplacetempbiew(dfview)

Now when I am performing count for this to object it's showing different count

Select count(*) from dfview---value1

%sql
Select count(*) from Df---value 2

Can anyone please help me why I am getting difference in value 1 and Value

Upvotes: 0

Views: 57

Answers (1)

Pratik Lad
Pratik Lad

Reputation: 8402

If your DataFrame (Df) has been cached in memory before the createOrReplaceTempView step, the cached version may not reflect any updates or changes to the underlying Delta table.

  • Clear the cache before querying the temporary view or the DataFrame for a fresh count::
spark.catalog.clearCache()

Also, please check the count of rows in data Frame using count() function. Spark Count is an action that results in the number of rows available in a DataFrame.

 df.count()

Upvotes: 0

Related Questions