Compare two dataframes row count. Assign dataframe with high row count to a new dataframe object

Question

I have two physical nodes that are not synchronised.

Both nodes produce captured data. (Two nodes technology was put in place for resilience).

I am facing following challenge:

nodes produce two identical files (timestamps may not be the same, no unique identifier in order to remove duplicates). Both frames share the same schema.

Is there a way to write in data frame using pyspark something like:

df3= case 
         when df1.count()df2.count() then  df1,
         ELSE df1

Answers (1)