Sergio Dalla Valle
Sergio Dalla Valle

Reputation: 407

spark join change equalTo function

I have two datasets and I would like to merge the tables if the element of a column contains the element of the other. How can I do?

val df = df1.join(df2, 
    df1.col("Complete Name").equalTo(df2.col("Name")))

Into

val df = df1.join(df2, 
    df1.col("Complete Name").ifContain(df2.col("Name")))

Upvotes: 0

Views: 179

Answers (2)

Tomasz Krol
Tomasz Krol

Reputation: 668

What if you do something like this

{
df1.join(df2, df1.col("Complete Name").ifContain(df2.col("Name")), "left_anti)
.union(df2.join(df1, df1.col("Complete Name").ifContain(df2.col("Name")), "left_anti))
}

Didn't test it though.

Upvotes: 0

Dici
Dici

Reputation: 25980

How about:

Dataset<Row> d1 = datasetFromJsonStrings(listOf("{\n" +
    "  \"key\": \"name\",\n" +
    "  \"origin\": \"left\"\n" +
"}"));

Dataset<Row> d2 = datasetFromJsonStrings(listOf("{\n" +
    "  \"key\": \"complete name\",\n" +
    "  \"origin\": \"right\"\n" +
"}"));

// [name,left,complete name,right]
List<Row> rows = d1.join(d2, d2.col("key").contains(d1.col("key"))).collectAsList();

Note: I did it in Java out of convenience because my entire codebase is in Java, not Scala.

Upvotes: 2

Related Questions