Reputation: 1
I have a set of pdf documents in russian and I need to translate them to English. I need to automate this activity.
Currently I upload the document to Google Translate and get it translated but it takes a lot of time to do this and is not scalable.
Upvotes: -2
Views: 1169
Reputation: 1
use python get this automated
import all required modules like pymupdf , from deep_translator import GoogleTranslator
code - `
import fitz # PyMuPDF from deep_translator import GoogleTranslator
WHITE = fitz.utils.getColor("white") textflags = fitz.TEXT_DEHYPHENATE # Handle hyphenated words to_CH = GoogleTranslator(source="en", target="zh-CN")
doc = fitz.open(r'C:\projects\Trunk\translator\0A1.pdf')
ocg = doc.add_ocg('Chinese Translation', on=True)
for page in doc: blocks = page.get_text("blocks", flags=textflags)
for block in blocks:
bbox = block[:4] # Text position (x0, y0, x1, y1)
text = block[4] # Extracted text
translated_text = to_CH.translate(text)
# Translate to Chinese
# Remove original text by overlaying a white rectangle
page.draw_rect(bbox, color=None, fill=WHITE, oc=ocg)
# Insert translated text at the same position
page.insert_textbox(bbox, translated_text,fontname="helv",
fontsize=10, color=(0, 0, 0), oc=ocg)
doc.subset_fonts()
doc.save(r'C:\projects\Trunk\translator\translated_chinese.pdf') print("Translated PDF saved successfully!")`
and you're pdf will get translated
Upvotes: -1
Reputation: 553
(Note: I am unfamiliar with translating documents, but this should get your basic architecture in the right direction).
Based on our brief exchange, I would recommend exploring sort of a process like this:
With that, you would host docs using a SharePoint List, where when a doc is added, a Power Automate flow would trigger and translate then re-write the doc. You could either use Microsoft's in-house extraction and translation software (or Automate steps/actions), or you could send an HTTP request to whatever client you would like. A Google search for translation or text extraction APIs reveals several options, including Google Translate.
If you don't have any requirements to use Google Translate (or something else), I would personally stick with the same brand of tech so that there's less headache of working with an outside client... but of course that's up to whatever your requirements are. You can initiative HTTP requests with the "HTTP" action.
Within Power Automate, you would use the "When an item is created" SharePoint trigger, then Encodian's "Extract Text from Image" (or something different depending on your file type).
Then, simply take the output and toss it to Microsoft Translate, or an HTTP request to wherever you want
You can then write the translated output to wherever you would like, another SharePoint List, a database, email, whatever.
Upvotes: -1