samol
samol

Reputation: 20610

How to chain joins in scala spark

I have a list of dataframe that I need to join:

My current method is a big ugly

testdf1
    .join(testdf2, Seq("uuid","datestr"), "outer")
    .join(testdf3, Seq("uuid","datestr"), "outer")
    .join(testdf4, Seq("uuid","datestr"), "outer")
    .join(testdf5, Seq("uuid","datestr"), "outer")
    .join(testdf6, Seq("uuid","datestr"), "outer")
    .join(testdf7, Seq("uuid","datestr"), "outer")

Given an seq of dataframes, is there a way to apply the same operations

Seq(testdf1,testdf2,testdf3,testdf4,testdf5,testdf6,testdf7)

How to write a generic function that joins them all?

Upvotes: 1

Views: 857

Answers (1)

L. CWI
L. CWI

Reputation: 962

Given

val dataframes = Seq(testdf1,testdf2,testdf3,testdf4,testdf5,testdf6,testdf7)

you can use reduceLeft

val joinedDF = dataframes.reduceLeft((df1, df2) => 
    df1.join(df2, Seq("uuid", "datestr"), "outer")
)

Upvotes: 1

Related Questions