Reputation: 41
I'm trying to translate a yml file using the googletrans API. This is my code:
#Import
from googletrans import Translator
import re
# API
translator = Translator()
# Counter
counter_DoNotTranslate = 0
counter_Translate = 0
#Translater
with open("ValuesfileNotTranslatedTest.yml") as a_file: #Values file not translated
for object in a_file:
stripped_object = object.rstrip()
found = False
file = open("ValuesfileTranslated.yml", "a") #Translated file
if "# Do not translate" in stripped_object: #Dont translate lines with "#"
counter_DoNotTranslate += 1
file.writelines(stripped_object + "\n")
else: #Translates english to dutch and appends
counter_Translate += 1
results = translator.translate(stripped_object, src='en', dest='nl')
translatedText = results.text
file.writelines(re.split('|=', translatedText, maxsplit=1)[-1].strip() + "\n" )
#Print
print("# Do not translate found: " + str(counter_DoNotTranslate))
print("Words translated: " + str(counter_Translate))
This is the yml file I want to translate:
'Enter a section title'
'Enter a description of the section. This will also be shown on the course details page'
'Title'
'Description'
'Start date'
'End date'
Published
Section is optional
Close discussions?
'Enter a title'
But when I try to run the code I get the following error:
File "/Users/AndreB/Library/Python/3.9/lib/python/site-packages/googletrans/client.py", line 219, in translate
parsed = json.loads(data[0][2])
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/__init__.py", line 339, in loads
raise TypeError(f'the JSON object must be str, bytes or bytearray, '
TypeError: the JSON object must be str, bytes or bytearray, not NoneType
I think the problem is that there are different whitespaces in the yml file, so I tried adding
if stripped_object is None: #This would skip the lines in the yaml file where there are whitespaces
file.writelines(stripped_object + "\n")
to the code. But I still get the same error message.
Does anyone have an idea how I can fix this?
Upvotes: 2
Views: 6358
Reputation: 5954
There are quite a lot of problems with the code you present, none of which is causing the problem. The problem is, indeed, likely caused by blank lines in the yml file, but your test is incorrect:
"" is None # False
" " is None # also False
not "" # True
not " " # False
not " ".strip() # True
So the correct way to test for a line consisting of zero or more whitespace chars is to take the truthiness of line.strip()
. In this case your gate would be:
if not line.strip():
out.write("\n")
Which brings me to the other problems with this code:
object
, file
)Here's a draft of what a function might look like which avoids these problems:
from pathlib import Path
from googletrans import Translator
translator = Translator()
def translate_file(infn: str | Path, outfn: str | Path, src="en", dest="dl") -> Tuple[int, int]:
inf = Path(infn)
outf = Path(outfn)
translated = 0
skipped = 0
with infn.open() as inf, outfn.open("w") as outf:
for line in inf:
if not line.strip():
outf.write("\n")
elif "# Do not translate" in line:
outf.write(line)
skipped += 1
else:
outf.write(translate.translate(line, src=src, dest=dest))
translated += 1
return translated, skipped
There are other things you doubtless want to do, and I don't understand your code to handle the response from translate.translate()
(doubtless because I have never used the library).
Note that if you do actually want to translate real yml, you would be much better first parsing it, then translating the bits of the tree which need translating, and then dumping it back to disk. Working line by line is going to break sooner or later with valid syntax which doesn't work linewise.
Upvotes: 1