Markus
Markus

Reputation: 3782

How to properly format the string in a new column of DataFrame?

I have a DataFrame with two columns col1 and col2 (Spark 2.2.0 and Scala 2.11). I need to create a new column in the following format:

=path("http://mywebsite.com/photo/AAA_BBB.jpg", 1)

where AAA is the value of col1 and BBB is the value of col2 for a given row.

The problem is that I do not know how to properly handle ". I tried this:

df = df.withColumn($"url",=path("http://mywebsite.com/photo/"+col("col1") + "_"+col("col2")+".jpg", 1))"

UPDATE:

It compiles ok now, but column values are not inserted in a string. Instead of column values, I see the text col1 and col2.

df = df.withColumn("url_rec",lit("=path('http://mywebsite.com/photo/"+col("col1")+"_"+col("col1")+".jpg', 1)"))

I get this:

=path('http://mywebsite.com/photo/col1_col1.jpg', 1)

Upvotes: 1

Views: 1977

Answers (2)

Iraj Hedayati
Iraj Hedayati

Reputation: 1677

This is an old question but I put my answer here for anybody else. You can use the format_string function

scala> df1.show()
+----+----+
|col1|col2|
+----+----+
| AAA| BBB|
+----+----+

scala> df1.withColumn(
            "URL",
            format_string(
                """=path("http://mywebsite.com/photo/%s_%s.jpg", 1)""", 
                col("col1"), 
                col("col2")
            )
        ).show(truncate = false)

+----+----+--------------------------------------------------+
|col1|col2|URL                                               |
+----+----+--------------------------------------------------+
|AAA |BBB |=path("http://mywebsite.com/photo/AAA_BBB.jpg", 1)|
+----+----+--------------------------------------------------+

Upvotes: 0

philantrovert
philantrovert

Reputation: 10082

As stated in the comments, you can either use concat multiple times like :

d.show
+---+---+
|  a|  b|
+---+---+
|AAA|BBB|
+---+---+

d.withColumn("URL" , 
   concat(
       concat(
           concat(
               concat(lit("""=path("http://mywebsite.com/photo/""" ), $"a") ,
               lit("_") ) , $"b" 
           ) 
           , lit(""".jpg", 1) """) 
         ).as[String].first

// String = "=path("http://mywebsite.com/photo/AAA_BBB.jpg", 1) "

Or you can map over the dataframe to append a new column ( which is cleaner than the concat method )

import org.apache.spark.sql.Row
import org.apache.spark.sql.types._

val urlRdd = d.map{ x => 
     Row.fromSeq(x.toSeq ++ Seq(s"""=path("http://mywebsite.com/photo/${x.getAs[String]("a")}_${x.getAs[String]("b")}.jpg", 1)""")) 
    }

val newDF = sqlContext.createDataFrame(urlRdd, d.schema.add("url", StringType) )

newDF.map(_.getAs[String]("url")).first
// String = =path("http://mywebsite.com/photo/AAA_BBB.jpg", 1)

Upvotes: 1

Related Questions