Harini
Harini

Reputation: 65

Reading Key-Value pairs in a text file, key as column names and values as rows using Scala and Spark

I have a text file in following format.

<Begin  
Id=1  
Name=John  
Age=32  
<End  

<Begin  
Id=2  
Name=Jack  
Age=20  
<End

What I want to do is create a data frame in spark using Scala, taking the above keys as column names and their values as the rows in the data frame. Note that I have several Begin-End sections as well as several key values pairs.

I have taken the keys and values to separate columns and tried to convert the key row values to column names but failed . This is what I have done:

    val test = textRDD.filter(rows => !rows.contains("<Begin")).filter(rows => !rows.contains("<End")).map {
      line =>
        val har = line.split("=")
        Table1(har(0), har(1))

    }
    val testDF = test.toDF()
    testDF.show()

This is my case class

case class Table1(columNames:String, valuesCol:String)

Output -:

+------------+-----------+
| columNames | valuesCol |
+------------+-----------+
| Id         | 1         |
| Name       | John      |
| Age        | 32        |
| Id         | 2         |
| Name       | Jack      |
| Age        | 20        |
+------------+-----------+

Required output -:

+----+------+-----+  
| Id | Name | Age |  
+----+------+-----+  
|  1 | John |  32 |  
|  2 | Jack |  20 |  
+----+------+-----+  


I'm using Spark 2.4.0

Thanks.

Upvotes: 1

Views: 614

Answers (1)

Lamanus
Lamanus

Reputation: 13541

Try this one. In order to use the pivot, you should have the separate key for grouping those values as a row. So, I added it and use pivot to the dataframe.

val w = Window.rowsBetween(Window.unboundedPreceding, Window.currentRow)

df.withColumn("group", when('columNames === lit("Id"), 1))
  .withColumn("group", sum('group).over(w))
  .groupBy("group").pivot("columNames", Seq("Id", "Name", "Age")).agg(first("valuesCol"))
  .drop("group")
  .show()

+---+----+---+
| Id|Name|Age|
+---+----+---+
|  1|John| 32|
|  2|Jack| 20|
+---+----+---+

Upvotes: 1

Related Questions