user13000875
user13000875

Reputation: 495

Getting junk character if file encoding is utf-16le

For csv file which encoding is utf-16le. When I try to read data of csv it gives me junk character

To get file encoding I use below command

 file -bi test.csv

it gives me text/plain; charset=utf-16le

To read file data I use below command

head -n1 test.csv | tr '^' ','

it gives me ��colon1,colon2,colon3

Why it is giving me junk charchater

Upvotes: 0

Views: 409

Answers (1)

tshiono
tshiono

Reputation: 22012

As the csv file is encoded with UTF-16LE, the file starts with the BOM (Byte Order Mark), 0xff and 0xfe. You can identify it with:

head -n1 test.csv | xxd

UTF-8 is most commonly used now and UTF-16 is getting less used (including Windows). Your locale will be also defaulted to UTF-8. So please try:

iconv -f UTF-16LE -t UTF-8 test.csv | head -n1 | tr '^' ','

which converts the csv file to UTF-8 coding.

Upvotes: 2

Related Questions