kushagra mittal
kushagra mittal

Reputation: 343

Convert DataFrame Format

I have my dataframe in below format -

 |-- id: string (nullable = true)
 |-- epoch: string (nullable = true)
 |-- data: map (nullable = true)
 |    |-- key: string
 |    |-- value: string (valueContainsNull = true)

and convert into having multiple values-

 |-- id: string (nullable = true)
 |-- epoch: string (nullable = true)
 |-- key: string (nullable = true)
 |-- value: string (nullable = true)

Example:

From:

1,12345, [pq -> r, ab -> c]

To:

1,12345, pq ,r
1,12345, ab ,c

I am trying this code but doesn't work-

val array2Df = array1Df.flatMap(line =>
              line.getMap[String, String](2).map(
                (line.getString(0),line.getString(1),_)
            )) 

Upvotes: 0

Views: 48

Answers (1)

Ehsan Ullah Nazir
Ehsan Ullah Nazir

Reputation: 1917

Try following

 val arrayData = Seq(
      Row("1","epoch_1",Map("epoch_1_key1"->"epoch_1_val1","epoch_1_key2"->"epoch_1_Val2")),
      Row("2","epoch_2",Map("epoch_2_key1"->"epoch_2_val1","epoch_2_key2"->"epoch_2_Val2"))
    )    

 val arraySchema = new StructType()
      .add("Id",StringType)
      .add("epoch", StringType)
      .add("data", MapType(StringType,StringType))

  val df = spark.createDataFrame(spark.sparkContext.parallelize(arrayData),arraySchema)
  df.printSchema()
  df.show(false)

enter image description here

After that you need to explode based on data column. Don't forget to

import org.apache.spark.sql.functions.explode

df.select($"Id",explode($"data")).show(false)

enter image description here

Upvotes: 2

Related Questions