Alessia Lesyte
Alessia Lesyte

Reputation: 39

Compare two dataframes row count. Assign dataframe with high row count to a new dataframe object

I have two physical nodes that are not synchronised.

Both nodes produce captured data. (Two nodes technology was put in place for resilience).

I am facing following challenge:

Is there a way to write in data frame using pyspark something like:

df3= case 
         when df1.count()<df2.count() then  df2,
         when df1.count()>df2.count() then  df1,
         ELSE df1

Upvotes: 0

Views: 386

Answers (1)

Alessia Lesyte
Alessia Lesyte

Reputation: 39

Resolved following case by defining "comparison" function.

def compare(df1, df2):
    if df1.count() > df2.count(): 
        return df1 
    if df1.count() < df2.count(): 
    return df2 
    else:
      return df1

Seems possibility to work with dataframes as an object can be achieved via functions

Upvotes: 0

Related Questions