Reputation: 739
I am trying to perform a leftanti join in spark 2.4 but I am running into errors.
Dataset<Row> df = spark.read()
.option("mode", "DROPMALFORMED")
.schema(schema)
.csv("/some/path/cars.csv");
Dataset<Row> Audis = df.filter(col("Make").equalTo("Audi"));
Dataset<Row> BMWs = df.filter(col("Make").equalTo("BMW"));
Audis.join(BMWs, "Make").show();
df.as("df").join(Audis.as("audi"), col("Make"), "leftanti")
.show();
The first join works fine, but for the leftanti I get the following error:
org.apache.spark.sql.AnalysisException: Reference 'Make' is ambiguous, could be: df.Make, audi.Make.;
Why would this be ambiguous? It should know which column should be checking for 'IS NOT NULL' in this kind of join.
Other examples show this in Scala by providing a column expression, but this seems like it's not possible in Java as there is no method signature that supports an expression string like 'df.Make == Audi.Make'
// No method exists for such a signature
df.as("df").join(Audis.as("audi"), "df.Make == audi.Make", "leftanti")
So far all examples I have seen of this type of join are not in Java, can someone explain why this error is occurring and what is a working example?
Upvotes: 1
Views: 522
Reputation: 739
Consulting with some colleagues and spending a few hours together. You need to use col("MyColumn").equalTo(col("OtherColumn")).
This example works:
df.as("df").join(Audis.as("audi"), col("df.Make").equalTo(col("audi.Make")), "leftanti")
Upvotes: 2