Igneous01
Igneous01

Reputation: 739

Apache Spark - 'LeftAnti' join Ambiguous column error

I am trying to perform a leftanti join in spark 2.4 but I am running into errors.

Dataset<Row> df = spark.read()
            .option("mode", "DROPMALFORMED")
            .schema(schema)
            .csv("/some/path/cars.csv");

Dataset<Row> Audis = df.filter(col("Make").equalTo("Audi"));
Dataset<Row> BMWs = df.filter(col("Make").equalTo("BMW"));
Audis.join(BMWs, "Make").show();

df.as("df").join(Audis.as("audi"), col("Make"), "leftanti")
           .show();

The first join works fine, but for the leftanti I get the following error:

org.apache.spark.sql.AnalysisException: Reference 'Make' is ambiguous, could be: df.Make, audi.Make.;

Why would this be ambiguous? It should know which column should be checking for 'IS NOT NULL' in this kind of join.

Other examples show this in Scala by providing a column expression, but this seems like it's not possible in Java as there is no method signature that supports an expression string like 'df.Make == Audi.Make'

// No method exists for such a signature
df.as("df").join(Audis.as("audi"), "df.Make == audi.Make", "leftanti")

So far all examples I have seen of this type of join are not in Java, can someone explain why this error is occurring and what is a working example?

Upvotes: 1

Views: 522

Answers (1)

Igneous01
Igneous01

Reputation: 739

Consulting with some colleagues and spending a few hours together. You need to use col("MyColumn").equalTo(col("OtherColumn")).

This example works:

df.as("df").join(Audis.as("audi"), col("df.Make").equalTo(col("audi.Make")), "leftanti")

Upvotes: 2

Related Questions