Reputation: 245
I wanted to create a histogram of word counts in a large sample by building a dictionary, then to print the most common words with their count, hence basically printing few key/value pairs.
However, many of the words were not in latin alphabet, so I did:
try:
print key, word_dict[key]
except:
print key.encode('utf-8'), word_dict[key],
When the results are printed directly into command-line interface, the non-latin Alphabet words are just unreadable, but the key/value order is maintained.
However, when I print the results into a .txt file, Arabic words are readable, the key/value pairs corresponding to such words seem to be printed in reverse order: value/key. Chinese characters however are printer in the correct order: key/value.
So I wonder is .txt so "smart" that it recognizes Arabic and prints in the Right-to-Left order? And moreover, what can I do to maintain the order of key/value I want?
Upvotes: 1
Views: 525
Reputation: 48599
When the results are printed directly into command-line interface, the non-latin Alphabet words are just unreadable
That could be because your terminal/cmd_window is not set to utf-8
, which you can change in the window's settings/preferences.
However, when I print the results into a .txt file, Arabic words are readable,
The program that opens your text file has a setting that tells it to interpret the bytes saved on disk as utf-8
.
Upvotes: 1