Create another dataframe from existing Dataframe with alias value in spark sql

Question

i am using spark 1.6 with scala.

I have created a Dataframe which looks like below.

DATA
SKU,    MAKE,   MODEL,  GROUP   SUBCLS  IDENT
IM, AN4032X,    ADH3M032,   RM, 1011,   0
IM, A3M4936,    MP3M4936,   RM, 1011,   0
IM, AK116BC,    3M4936P,    05, ABC,    0
IM, A-116-B,    16ECAPS,    RM, 1011,   0

I am doing data validation and capture any record in new dataframe which violate the rule.

Rule:

Column “GROUP” must be character 
Column “SUBCLS” must be NUMERIC
Column “IDENT” must be 0

The new Dataframe will looks like

AUDIT TABLE

SKU MAKE    AUDIT_SKU   AUDIT_MAKE  AUDIT_MODEL AUDIT_GRP   AUDIT_SUBCLS    Audit_IDENT
 IM,    A-K12216BC, N,  N,  N,  Y,  Y,  N

Y represent rule violation and N represent Rule pass.

i have validated rule using isnull or regex for ex: checking column Group using

regex: df.where( $"GROUP".rlike("^[A-Za-z]}$")).show

May someone please help me how can i do this in SPARK SQL. is it possible to create a dataframe with the above scenario.

Thanks

Mahesh Gupta · Accepted Answer

you can use rlike with |

scala> df.withColumn("Group1",when($"GROUP".rlike("^[\d+]|[A-Za-z]\d+"),"Y").otherwise("N")).withColumn("SUBCLS1",when($"SUBCLS".rlike("^[0-9]"),"N").otherwise("Y")).withColumn("IDENT1",when($"IDENT"==="0","N").otherwise("Y")).show()
+---+-------+--------+-----+------+-----+------+-------+------+
|SKU|   MAKE|   MODEL|GROUP|SUBCLS|IDENT|Group1|SUBCLS1|IDENT1|
+---+-------+--------+-----+------+-----+------+-------+------+
| IM|AN4032X|ADH3M032|   RM|  1011|    0|     N|      N|     N|
| IM|A3M4936|MP3M4936|  1RM|  1011|    0|     Y|      N|     N|
| IM|AK116BC| 3M4936P|   05|   ABC|    0|     Y|      Y|     N|
| IM|A-116-B| 16ECAPS|  RM1|  1011|    0|     Y|      N|     N|
+---+-------+--------+-----+------+-----+------+-------+------+

just write version 1 of each column for understanding purpose only you can overwrite column. let me know if you need any help on the same.

Create another dataframe from existing Dataframe with alias value in spark sql

Answers (1)

Related Questions