eugen-fried
eugen-fried

Reputation: 2173

Spark SQL - convert string to ASCII

we have an issue of one of the producers pushing some Unicode strings to a field which should be ASCII. Currently the job is pure-sql configurable hence I would like to know if it's possible to convert Unicode string to ASCII using just Spark SQL, something similar to solution given in this question (of course this will result in possible data loss for unsupported characters, but this is not a concern).

Upvotes: 0

Views: 4782

Answers (2)

stack0114106
stack0114106

Reputation: 8711

You can remove the unwanted chars using the regexp_replace()

scala> spark.sql(""" SELECT regexp_replace(decode(encode('Ä??ABCDE', 'utf-8'), 'ascii'), "[^\t\n\r\x20-\x7F]","")  x """).show(false)
+-----+
|x    |
+-----+
|ABCDE|
+-----+


scala>

Upvotes: 1

mck
mck

Reputation: 42392

Try encode:

SELECT encode(column, 'ascii') as column;

for example:

spark-sql> select encode('ÄÊÍABCDE', 'ascii');
???ABCDE

Upvotes: 2

Related Questions