Reputation: 13
I have two pySpark DataFrames, need to compare those two DataFrames column wise and append the result next to it.
DF1:
Claim_number | Claim_Status |
---|---|
1001 | Closed |
1002 | In Progress |
1003 | open |
Df2:
Claim_number | Claim_Status |
---|---|
1001 | Closed |
1002 | open |
1004 | In Progress |
Expected Result in pySpark:
DF3:
Claim_number_DF1 | Claim_number_DF2 | Comparison_of_Claim_number | Claim_status_DF1 | Claim_status_DF2 | Comparison_of_Claim_Status |
---|---|---|---|---|---|
1001 | 1001 | TRUE | Closed | Closed | TRUE |
1002 | 1002 | TRUE | In Progress | Open | FALSE |
1003 | 1004 | FALSE | open | In Progress | FALSE |
Upvotes: 0
Views: 39
Reputation: 596
DF(s) are nor ordered but distributed in different places so this is an invalid ask.
However what you can do instead is following -
If that is what yous ask is, then here is the solution.
final_Df = df1.join(df2, Claim_number, "inner").distinct()
Upvotes: 0