Reputation: 143
I have a dataframe df1 of the format
+------+------+------+
| Col1 | Col2 | Col3 |
+------+------+------+
| A | z | m |
| B | w | n |
| C | x | o |
| A | z | n |
| A | p | o |
+------+------+------+
and another dataframe df2 of the format
+------+------+
| Col1 | Col2 |
+------+------+
| 0-A | 0-z |
| 1-B | 3-w |
| 2-C | 1-x |
| | 2-P |
+------+------+-
I am trying to replace the values in Col1 and Col2 of df1 with values from df2 using Spark Java.
The end dataframe df3 should look like this.
+------+------+------+
| Col1 | Col2 | Col3 |
+------+------+------+
| 0-A | 0-z | m |
| 1-B | 3-w | n |
| 2-C | 1-x | o |
| 0-A | 0-z | n |
| 0-A | 2-p | o |
+------+------+------+
I am trying to replace all the values in the column1 and column2 of df1 with values from col1 and col2 of df2. Is there anyway that i can achieve this in Spark Java dataframe syntax.?
The initial idea i had was to do the following.
String pattern1="\\p{L}+(?: \\p{L}+)*$";
df1=df1.join(df2, df1.col("col1").equalTo(regexp_extract(df2.col("col1"),pattern1,1)),"left-semi");
Upvotes: 0
Views: 1471
Reputation: 2431
Replace your last join
operation with below join.
df1.alias("x").join(df2.alias("y").select(col("y.Col1").alias("newCol1")), col("x.Col1") === regexp_extract(col("newCol1"),"\\p{L}+(?: \\p{L}+)*$",0), "left")
.withColumn("Col1", col("newCol1"))
.join(df2.alias("z").select(col("z.Col2").alias("newCol2")), col("x.Col2") === regexp_extract(col("newCol2"),"\\p{L}+(?: \\p{L}+)*$",0), "left")
.withColumn("Col2", col("newCol2"))
.drop("newCol1", "newCol2")
.show(false)
+----+----+----+
|Col1|Col2|Col3|
+----+----+----+
|2-C |1-x |o |
|0-A |0-z |m |
|0-A |0-z |n |
|0-A |2-p |o |
|1-B |3-w |n |
+----+----+----+
Upvotes: 1