Reputation: 39
I have two physical nodes that are not synchronised.
Both nodes produce captured data. (Two nodes technology was put in place for resilience).
I am facing following challenge:
Is there a way to write in data frame using pyspark something like:
df3= case
when df1.count()<df2.count() then df2,
when df1.count()>df2.count() then df1,
ELSE df1
Upvotes: 0
Views: 386
Reputation: 39
Resolved following case by defining "comparison" function.
def compare(df1, df2):
if df1.count() > df2.count():
return df1
if df1.count() < df2.count():
return df2
else:
return df1
Seems possibility to work with dataframes as an object can be achieved via functions
Upvotes: 0