Reputation: 1030
I am doing join of 2 data frames and select all columns of left frame for example:
val join_df = first_df.join(second_df, first_df("id") === second_df("id") , "left_outer")
in above I want to do select first_df.* .How can I select all columns of one frame in join ?
Upvotes: 9
Views: 32610
Reputation: 71
Just to add one possibility, whithout using alias, I was able to do that in pyspark with
first_df.join(second_df, "id", "left_outer").select( first_df["*"] )
Not sure if applies here, but hope it helps
Upvotes: 0
Reputation: 651
Suppose you:
Then, you might do the following:
val selectColumns = df1.columns.map(df1(_)) ++ Array(df2("field1"), df2("field2"))
df1.join(df2, df1("key") === df2("key")).select(selectColumns:_*)
Upvotes: 3
Reputation: 341
We can also do it with leftsemi join. leftsemi join will select the data from left side dataframe from a joined dataframe.
Here we join two dataframes df1 and df2 based on column col1.
df1.join(df2, df1.col("col1").equalTo(df2.col("col1")), "leftsemi")
Upvotes: 3
Reputation:
With alias:
first_df.alias("fst").join(second_df, Seq("id"), "left_outer").select("fst.*")
Upvotes: 22