Reputation: 335
I have two files. one in utf-8 and the other I think is in windows-1256. I want to unify their encoding (One is train set and the other is test set)
utf-8 file:
سلمانی را به توافق بگیر
وقتی یک مرد محترم شصت ساله ، در یک جامه قهوهای رسمی ، خوش لباس ، ولی خیلی خوب نگه داشته
windows-1256 file:
äÇåí Èå äãÇíÔÇå ÂËÇÑ åäÑí ÇÍãÏ ØÈÇØÈÇíí
ãæÖæÚ ÂËÇÑ ØÈÇØÈÇíí ãæÑÇä åÓÊäÏ æáí ÏÑ ÈÇØä äíä ÙÇåÑí¡ Çíä
I tried multiple online tools but when I convert utf-8 to 1256 it looks completely different from the other file and when I convert 1256 to utf-8 it doesn't change a bit!
Upvotes: 0
Views: 2244
Reputation: 335
The problem is solved. I used this command:
iconv -f UTF-8 -t WINDOWS-1256//TRANSLIT --output=Ham.txt Ham-utf
The problem was that my windows-1256 file was so big. I copied part of it in a separate file named ham-mini. Copying part of it was the problem and damaged the file. I used above command for original file and the problem get solved.
Upvotes: 1