Reputation: 83
I am receiving file from API which have a encoded(non-ascii) character value in 3 columns. when i am reading file using DataFrame in Spark1.6
val CleanData= sqlContext.sql("""SELECT
COL1
COL2,
COL3
FROM CLEANFRAME
""" )
Encoded value looks like below.
But encoded value appear like
53004, �����������������������������
May someone please help me how to fix this error if possiblw with spark 1.6 and scala. Spark 1.6, scala
Upvotes: -1
Views: 44
Reputation: 196
#this ca be achieved by using the regex_replace
val df = spark.sparkContext.parallelize(List(("503004","d$üíõ$F|'.h*Ë!øì=(.î; ,.¡|®!®","3-2-704"))).toDF("col1","col2","col3")
df.withColumn("col2_new", regexp_replace($"col2", "[^a-zA-Z]", "")).show()
Output:
+------+--------------------+-------+--------+
| col1| col2| col3|col2_new|
+------+--------------------+-------+--------+
|503004|d$üíõ$F|'.h*Ë!øì=...|3-2-704| dFh|
+------+--------------------+-------+--------+
Upvotes: 0