Pyspark Encoding

Question

I'm working with the data saved using alteryx workflow to s3 and then we are using pyspark to write that data in redshift

Now problem is

Its encoded using ANSI/OEM Japnese shift-jis as shown below

Now While reading it in pyspark with shift-jis encoding it still failes to parse data properly, below is the example

was parsed to

while

Is getting properly parsed

So My question is, is ANSI/OEM shift_jis is different than the shift-jis used in pyspark, and f so what can be used to resolve the issue

Answers (1)