Sophie Dinka
Sophie Dinka

Reputation: 83

Reading Encoded value in spark 1.6 throwing Error

I am receiving file from API which have a encoded(non-ascii) character value in 3 columns. when i am reading file using DataFrame in Spark1.6

val CleanData= sqlContext.sql("""SELECT
                                               COL1
                                               COL2,
                                               COL3
                                               FROM CLEANFRAME
                                               """ )

Encoded value looks like below.

enter image description here

But encoded value appear like

53004, �����������������������������

May someone please help me how to fix this error if possiblw with spark 1.6 and scala. Spark 1.6, scala

Upvotes: -1

Views: 44

Answers (1)

sangam.gavini
sangam.gavini

Reputation: 196

#this ca be achieved by using the regex_replace
    val df = spark.sparkContext.parallelize(List(("503004","d$üíõ$F|'.h*Ë!øì=(.î;      ,.¡|®!®","3-2-704"))).toDF("col1","col2","col3")
    df.withColumn("col2_new", regexp_replace($"col2", "[^a-zA-Z]", "")).show()    
Output:
+------+--------------------+-------+--------+
|  col1|                col2|   col3|col2_new|
+------+--------------------+-------+--------+
|503004|d$üíõ$F|'.h*Ë!øì=...|3-2-704|     dFh|
+------+--------------------+-------+--------+

Upvotes: 0

Related Questions