Reputation: 3782
I have a DataFrame with two columns col1
and col2
(Spark 2.2.0 and Scala 2.11). I need to create a new column in the following format:
=path("http://mywebsite.com/photo/AAA_BBB.jpg", 1)
where AAA
is the value of col1
and BBB
is the value of col2
for a given row.
The problem is that I do not know how to properly handle "
. I tried this:
df = df.withColumn($"url",=path("http://mywebsite.com/photo/"+col("col1") + "_"+col("col2")+".jpg", 1))"
UPDATE:
It compiles ok now, but column values are not inserted in a string. Instead of column values, I see the text col1
and col2
.
df = df.withColumn("url_rec",lit("=path('http://mywebsite.com/photo/"+col("col1")+"_"+col("col1")+".jpg', 1)"))
I get this:
=path('http://mywebsite.com/photo/col1_col1.jpg', 1)
Upvotes: 1
Views: 1977
Reputation: 1677
This is an old question but I put my answer here for anybody else. You can use the format_string
function
scala> df1.show()
+----+----+
|col1|col2|
+----+----+
| AAA| BBB|
+----+----+
scala> df1.withColumn(
"URL",
format_string(
"""=path("http://mywebsite.com/photo/%s_%s.jpg", 1)""",
col("col1"),
col("col2")
)
).show(truncate = false)
+----+----+--------------------------------------------------+
|col1|col2|URL |
+----+----+--------------------------------------------------+
|AAA |BBB |=path("http://mywebsite.com/photo/AAA_BBB.jpg", 1)|
+----+----+--------------------------------------------------+
Upvotes: 0
Reputation: 10082
As stated in the comments, you can either use concat
multiple times like :
d.show
+---+---+
| a| b|
+---+---+
|AAA|BBB|
+---+---+
d.withColumn("URL" ,
concat(
concat(
concat(
concat(lit("""=path("http://mywebsite.com/photo/""" ), $"a") ,
lit("_") ) , $"b"
)
, lit(""".jpg", 1) """)
).as[String].first
// String = "=path("http://mywebsite.com/photo/AAA_BBB.jpg", 1) "
Or you can map over the dataframe to append a new column ( which is cleaner than the concat
method )
import org.apache.spark.sql.Row
import org.apache.spark.sql.types._
val urlRdd = d.map{ x =>
Row.fromSeq(x.toSeq ++ Seq(s"""=path("http://mywebsite.com/photo/${x.getAs[String]("a")}_${x.getAs[String]("b")}.jpg", 1)"""))
}
val newDF = sqlContext.createDataFrame(urlRdd, d.schema.add("url", StringType) )
newDF.map(_.getAs[String]("url")).first
// String = =path("http://mywebsite.com/photo/AAA_BBB.jpg", 1)
Upvotes: 1