Anil Kumar
Anil Kumar

Reputation: 565

Table data conversion using Spark Scala

I've the Hive table data, i would need some help on transforming the data into a below shown "Expected data form".

Two things on the note:

  1. omit the columns which are showing null, like omit the columns abs, ada, adw on the first row.
  2. for those columns which are of type array(ex: abs, ada, adw, alt) & value is not null, include the column name within the array as shown in the Expected data form and keep an outer column name called EVENTS.

Is there a way i can achieve this using Spark-sql, or do i need write some scala UDF. I need the solution in spark-scala. Any help would be much appreciated.

Hive Table data:

     _______________________________________________________________________________________________________
   | vin               |  tt |msg_type |  abs  |ada                     |  adw  | alt                      |
   |___________________|_____|_________|_______|________________________|_______|__________________________|
   | FU7XXXXXXXXXXXXXX |  0  |SIGNAL   | (null)|(null)                  | (null)|[{"E":15XXXXXXXX,"V":0.0}]|
   |__________________ |_____|_________|______ |________________________|_______|__________________________|
   | FSXXXXXXXXXXXXXXX |  0  |SIGNAL   | (null)|[{"E":15XXXXXXXX,"V":1}]| (null)|  (null)                  |
   |___________________|_____|_________|_______|________________________|_______|__________________________|

Expected data:

       _______________________________________________________________________
   | vin               |  tt |msg_type |  EVENTS                              |
   |___________________|_____|_________|______________________________________|
   | FU7XXXXXXXXXXXXXX |  0  |SIGNAL   | [{"SN":"alt","E":15XXXXXXXX,"V":0.0}]|
   |__________________ |_____|_________|______ _______________________________|
   | FSXXXXXXXXXXXXXXX |  0  |SIGNAL   | [{"SN":"ada","E":15XXXXXXXX,"V":1}]  |                
   |___________________|_____|_________|______________________________________|

Upvotes: 0

Views: 243

Answers (1)

Nikhil Suthar
Nikhil Suthar

Reputation: 2431

I have updated your input data and also declaring one String variable to make UDF generic.

 scala> df.show(false)
    +-----------------+---+--------+--------------------------+--------------------------+--------------------------+--------------------------+
    |vin              |tt |msg_type|abs                       |ada                       |adw                       |alt                       |
    +-----------------+---+--------+--------------------------+--------------------------+--------------------------+--------------------------+
    |FU7XXXXXXXXXXXXXX|0  |SIGNAL  |null                      |null                      |null                      |[{"E":15XXXXXXXX,"V":0.0}]|
    |FU7XXXXXXXXXXXXXX|0  |SIGNAL  |null                      |[{"E":15XXXXXXXX,"V":1}]  |null                      |null                      |
    |FU7XXXXXXXXXXXXXX|0  |SIGNAL  |[{"E":15XXXXXXXX,"V":2}]  |null                      |null                      |null                      |
    |FU7XXXXXXXXXXXXXX|0  |SIGNAL  |null                      |null                      |[{"E":15XXXXXXXX,"V":3}]  |null                      |
    |FU7XXXXXXXXXXXXXX|0  |SIGNAL  |null                      |null                      |[{"E":15XXXXXXXX,"V":4.1}]|[{"E":15XXXXXXXX,"V":4.2}]|
    |FU7XXXXXXXXXXXXXX|0  |SIGNAL  |[{"E":15XXXXXXXX,"V":5.1}]|null                      |[{"E":15XXXXXXXX,"V":5.2}]|null                      |
    |FU7XXXXXXXXXXXXXX|0  |SIGNAL  |[{"E":15XXXXXXXX,"V":6.1}]|[{"E":15XXXXXXXX,"V":6.2}]|[{"E":15XXXXXXXX,"V":6.3}]|null                      |
    +-----------------+---+--------+--------------------------+--------------------------+--------------------------+--------------------------+

    //String of column name that we need to check for null in Event
    scala> val SingalColumns = "abs,ada, adw,alt"
    SingColumns: String = abs,ada, adw,alt

    //UDF declaration 
    scala> def EventUDF:UserDefinedFunction = udf((flagCol:String, r:Row) => {var signal = ""
         |             val flagColList:List[String] = flagCol.reverse.split(s""",""").map(x => x.trim).mkString(",").reverse.split(s",").toList
         |             flagColList.foreach{ x => 
         |                   if (r.getAs(x) != null){
         |                     signal= signal + "," + """{"SN":""" + x.toString +"," + r.getAs(x).toString.replaceFirst("\\[\\{","").replaceFirst("\\]","")
         |                   }
         |             }
         |             signal.replaceFirst(s""",""", """[""").concat("]")
         |             })

    //final DataFrame
    scala> df.withColumn("Event", EventUDF(lit(SingalColumns),struct(df.columns map col: _*))).select("vin","tt","msg_type","Event").show(false)
    +-----------------+---+--------+-------------------------------------------------------------------------------------------------------+
    |vin              |tt |msg_type|Event                                                                                                  |
    +-----------------+---+--------+-------------------------------------------------------------------------------------------------------+
    |FU7XXXXXXXXXXXXXX|0  |SIGNAL  |[{"SN":alt,"E":15XXXXXXXX,"V":0.0}]                                                                    |
    |FU7XXXXXXXXXXXXXX|0  |SIGNAL  |[{"SN":ada,"E":15XXXXXXXX,"V":1}]                                                                      |
    |FU7XXXXXXXXXXXXXX|0  |SIGNAL  |[{"SN":abs,"E":15XXXXXXXX,"V":2}]                                                                      |
    |FU7XXXXXXXXXXXXXX|0  |SIGNAL  |[{"SN":adw,"E":15XXXXXXXX,"V":3}]                                                                      |
    |FU7XXXXXXXXXXXXXX|0  |SIGNAL  |[{"SN":adw,"E":15XXXXXXXX,"V":4.1},{"SN":alt,"E":15XXXXXXXX,"V":4.2}]                                  |
    |FU7XXXXXXXXXXXXXX|0  |SIGNAL  |[{"SN":abs,"E":15XXXXXXXX,"V":5.1},{"SN":adw,"E":15XXXXXXXX,"V":5.2}]                                  |
    |FU7XXXXXXXXXXXXXX|0  |SIGNAL  |[{"SN":abs,"E":15XXXXXXXX,"V":6.1},{"SN":ada,"E":15XXXXXXXX,"V":6.2},{"SN":adw,"E":15XXXXXXXX,"V":6.3}]|
    +-----------------+---+--------+-------------------------------------------------------------------------------------------------------+

Upvotes: 1

Related Questions