sravs
sravs

Reputation: 43

convert string value to map using scala

I have a CSV file with one of the fields with a map as mentioned below "Map(12345 -> 45678, 23465 -> 9876)"

When I am trying to load the csv into dataframe, it is considering it as string. So, I have written a UDF to convert the string to map as below

 val convertToMap = udf((pMap: String) => { 
   val mpp = pMap
   // "Map(12345 -> 45678, 23465 -> 9876)" 
   val stg = mpp.substr(4, mpp.length() -1) val stg1=stg.split(regex=",").toList       
   val mp=stg1.map(_.split(regex=" ").toList) 
   val mp1 = mp.map(mp =>
   (mp(0), mp(2))).toMap 
   } )

Now I need help in applying the UDF to the column where it is being taken as string and return the DF with the converted column.

Upvotes: 0

Views: 1585

Answers (1)

Travis Hegner
Travis Hegner

Reputation: 2495

You are pretty close, but it looks like your UDF has some mix of scala and python, and the parsing logic needs a little work. There may be a better way to parse a map literal string, but this works with the provided example:

val convertToMap = udf { (pMap: String) =>
  val stg = pMap.substring(4, pMap.length() - 1)
  val stg1 = stg.split(",").toList.map(_.trim)
  val mp = stg1.map(_.split(" ").toList) 
  mp.map(mp =>(mp(0), mp(2))).toMap 
}

val df = spark.createDataset(Seq("Map(12345 -> 45678, 23465 -> 9876)")).toDF("strMap")

With the corrected UDF, you simply invoke it with a .select() or a .withColumn():

df.select(convertToMap($"strMap").as("map")).show(false)

Which gives:

+----------------------------------+
|map                               |
+----------------------------------+
|Map(12345 -> 45678, 23465 -> 9876)|
+----------------------------------+

With the schema:

root
 |-- map: map (nullable = true)
 |    |-- key: string
 |    |-- value: string (valueContainsNull = true)

Upvotes: 1

Related Questions