Reputation: 2618
I have two dataframes aaa_01 and aaa_02 in Apache Spark 2.1.0.
And I perform an Inner Join on these two dataframes selecting few colums from both dataframes to appear in the output.
The Join is working perfectly fine but the output dataframe has the column names as it was present in the input dataframes. I get stuck here. I need to have new column names instead of getting the same column names in my output dataframe.
Sample Code is given below for reference
DF1.alias("a").join(DF2.alias("b"),DF1("primary_col") === DF2("primary_col"), "inner").select("a.col1","a.col2","b.col4")
I am getting the output dataframe with column names as "col1, col2, col3". I tried to modify the code as below but in vain
DF1.alias("a").join(DF2.alias("b"),DF1("primary_col") === DF2("primary_col"), "inner").select("a.col1","a.col2","b.col4" as "New_Col")
Any help is appreciated. Thanks in advance.
Edited
I browsed and got similar posts which is given below. But I do not see an answer to my question.
Updating a dataframe column in spark
Renaming Column names of a Data frame in spark scala
The answers in this post : Spark Dataframe distinguish columns with duplicated name are not relevant to me as it is related more to pyspark than Scala and it had explained how to rename all the columns of a dataframe whereas my requirement is to rename only one or few columns.
Upvotes: 2
Views: 2523
Reputation: 41987
you can .as
alias as
import sqlContext.implicits._
DF1.alias("a").join(DF2.alias("b"),DF1("primary_col") === DF2("primary_col"), "inner").select($"a.col1".as("first"),$"a.col2".as("second"),$"b.col4".as("third"))
or you can use .alias
as
import sqlContext.implicits._
DF1.alias("a").join(DF2.alias("b"),DF1("primary_col") === DF2("primary_col"), "inner").select($"a.col1".alias("first"),$"a.col2".alias("second"),$"b.col4".alias("third"))
if you are looking to update only one column name then you can do
import sqlContext.implicits._
DF1.alias("a").join(DF2.alias("b"),DF1("primary_col") === DF2("primary_col"), "inner").select($"a.col1", $"a.col2", $"b.col4".alias("third"))
Upvotes: 3
Reputation: 3069
You want to rename columns of the dataset, the fact that your dataset comes from a join does not change anything. Yo can try any example from this answer, for instance :
DF1.alias("a").join(DF2.alias("b"),DF1("primary_col") === DF2("primary_col"), "inner")
.select("a.col1","a.col2","b.col4")
.withColumnRenamed("col4","New_col")
Upvotes: 5