Unicode Text Not Printing to Python Console/Terminal/Screen

Question

I am trying to do some Python text-parsing programming with Hebrew (Unicode) text from the Torah.

Here is link to the example text (Genesis) that I am using from Sefaria.org: https://github.com/Sefaria/Sefaria-Export/blob/master/json/Tanakh/Torah/Genesis/Hebrew/Tanach%20with%20Text%20Only.json

I am able to successfully import the JSON data.

I do the usual data extract tests + TEST OUTPUTS WITH PRINT() to examine the data.

In the following code below, I notice that only the output for KEYS stays on screen/terminal/console. All the other data (VALUES, ITEMS, and the VALUE for the dictionary key 'text') all disappear from the screen (please run the code with the data and see for yourself).

I figure this is some sort of encoding or decoding issue because any text with the Hebrew text (e.g. VALUES, ITEMS, and VALUE for the dictionary key 'text'), so I did standard sys check and printed the following output:

sys.stdin.encoding =  cp1252
sys.stdout.encoding =  cp1252

I figure that I may need to define/encode/decode or do something to allow written output of UTF-8 UNICODE characters (Hebrew) to the Python terminal.

Any ideas how to solve this issue?

## IMPORT NECESSARY MODULES
import json
import sys

## CHECK ENCODING AND PRINT/TEST OUTPUT
print("sys.stdin.encoding = ", sys.stdin.encoding)
print("sys.stdout.encoding = ", sys.stdout.encoding)

## READ JSON FILE & IMPORT DATA - UTF8 CODING TO READ HEBREW TEXT
json_data = open('DATA_1GENESIS.json', encoding="utf8").read()

## LOADS AND TRANSFORMS JSON DATA TO PYTHON DICTIONARY OBJECT
DictionaryData = json.loads(json_data)
print('
')
print("IMPORTED JSON DATA TYPE = ", type(DictionaryData))

## LOOP THROUGH DATA AND PRINT
for item in DictionaryData:
    print("ITEM = ",item, type(item), len(item))    

## TEST OUTPUT
print('
')
print("IMPORTED DICTIONARY DATA = ",DictionaryData,    type(DictionaryData),len(DictionaryData))

## EXTRACT DICTIONARY KEYS - 'dict_keys' object
k = DictionaryData.keys()
print('
')
print("KEYS = ",k,type(k),len(k))

## EXTRACT DICTIONARY VALUES - 'dict_values' object
v = DictionaryData.values()
print('
')
print("VALUES = ",v,type(v),len(v))

## EXTRACT DICTIONARY ITEMS - 'dict_items' object
i = DictionaryData.items()
print('
')
print("ITEMS = ",i,type(i),len(i))

## EXTRACT VALUE FOR KEY 'text' = DictionaryData['text']
text = DictionaryData['text']
print('
')
print("TEXT = ", text, type(text), len(text))

EDIT

I just did a test to test simple printing of one line only of the Unicode Hebrew. Here is the code and it worked perfectly to print output to Python screen/terminal/console. So question remains: why would those extracted values from the dictionary above disappear after printing to screen (please try the code with the data to see for yourselves!)?

x = "בראשית ברא אלהים את השמים ואת הארץ"
print("x = ",x)

Unicode Text Not Printing to Python Console/Terminal/Screen

Answers (1)

EDIT

Related Questions