jesky
jesky

Reputation: 57

Save spark DataFrame to csv file with map<string,string> column type

I have wrote udf function that convert Map[String,String] values to String:

 udf("mapToString", (input: Map[String,String]) => input.mkString(","))

spark-shell give me error:

    <console>:24: error: overloaded method value udf with alternatives:
  (f: AnyRef,dataType: org.apache.spark.sql.types.DataType)org.apache.spark.sql.expressions.UserDefinedFunction <and> 
...
cannot be applied to (String, Map[String,String] => String)
       udf("mapToString", (input: Map[String,String]) => input.mkString(","))

Is any method to convert column of Map[String,String] values to string values? I need this conversion because i need save dataframe as csv file

Upvotes: 2

Views: 2181

Answers (1)

Ramesh Maharjan
Ramesh Maharjan

Reputation: 41957

Assuming that you have a DataFrame as

+---+--------------+
|id |map           |
+---+--------------+
|1  |Map(200 -> DS)|
|2  |Map(300 -> CP)|
+---+--------------+

with the following schema

root
 |-- id: integer (nullable = false)
 |-- map: map (nullable = true)
 |    |-- key: string
 |    |-- value: string (valueContainsNull = true)

You can write a udf which looks like :

def mapToString = udf((map: collection.immutable.Map[String, String]) => 
                       map.mkString.replace(" -> ", ","))

and use the udf function with withColumn API as

df.withColumn("map", mapToString($"map"))

you should have final DataFrame where Map is changed to String

+---+------+
|id |map   |
+---+------+
|1  |200,DS|
|2  |300,CP|
+---+------+

root
 |-- id: integer (nullable = false)
 |-- map: string (nullable = true)

Upvotes: 4

Related Questions