Reputation: 473
When I'm trying to join two data frames using
DataFrame joindf = dataFrame.join(df, df.col(joinCol)); //.equalTo(dataFrame.col(joinCol)));
My program is throwing below exception
org.apache.spark.sql.AnalysisException: join condition 'url' of type string is not a boolean.;
Here joinCol value is url Need inputs as what could possibly cause these exceptions
Upvotes: 3
Views: 6608
Reputation: 1164
You cannot use df.col(joinCol) as this is not an expression. In order to join 2 dataframes you need to identify the columns you wanted to join
Let's say you have a DataFrame emp and dept, joining these two dataframes should look like below in Scala
empDF.join(deptDF,empDF("emp_dept_id") === deptDF("dept_id"),"inner")
.show(false)
This example is taken from Spark SQL Join DataFrames
Upvotes: 0
Reputation: 1092
What that means is that the join condition should evaluate to an expression. Lets say we want to join 2 dataframes based on id, so what we can do is :
With Python:
df1.join(df2, df['id'] == df['id'], 'left') # 3rd parameter is type of join which in this case is left join
With Scala:
df1.join(df2, df('id') === df('id')) // create inner join based on id column
Upvotes: 2
Reputation: 330433
join
variants which take as a second argument Column
expect that it can be evaluated as a boolean expression.
If you want a simple equi-join based on a column name use a version which takes a column name as a String
:
String joinCol = "foo";
dataFrame.join(df, joinCol);
Upvotes: 3