Reputation: 2074
In spark API:
column.like("only takes a static string with optional wildcards like %")
column.contains(accepts_a_column_but_wont_parse_wildcards)
So what's the equivalent method to call to compare values using wildcards that might show up in a string value from a column found in the join?
example that fails because like() accepts literal strings, not Column:
.join(other_df, column.like(concat("%", $"column_potentially_with_wildcards", "%")), "left")
?
Upvotes: 0
Views: 1257
Reputation: 1380
Looking at the code, like()
appears to only accept a literal value as a convenience. Hopefully they'll expand this in a future release, but for now you can create your own function to compensate:
import org.apache.spark.sql.catalyst.expressions.Like
import org.apache.spark.sql.Column
def columnLike(a : Column, b : Column) : Column = new Column( Like(a.expr, b.expr))
...
scala> val df1 = List("aaaa", "bbbb", "aaaabbbbcccc", "abcd", "abc").toDS()
df1: org.apache.spark.sql.Dataset[String] = [value: string]
scala> val df2 = List("a%b%c").toDS()
df2: org.apache.spark.sql.Dataset[String] = [value: string]
scala> df1.join(df2, columnLike(df1("value"), df2("value"))).show
+------------+-----+
| value|value|
+------------+-----+
|aaaabbbbcccc|a%b%c|
| abc|a%b%c|
+------------+-----+
Upvotes: 1