Reputation: 2341
When looking for a solution to translate text within R
, I got a lot of pretty old answers, proposing to use the package translateR
. The best answer I found is this one.
The answer is 6 years old and in the meantime translateR
is no longer part of the CRAN repository (anyone know why?). I was wondering if there are better options by now, that use a package that is in the CRAN repository.
My example data is as follows;
translate <- data.frame(sentences = c("This needs to be translated to Dutch",
"This also needs to be translated to Dutch",
"Just as this one has to"))
What is currently the best option to translate text within R?
Upvotes: 6
Views: 6032
Reputation: 301
The polyglotr package (https://github.com/Tomeriko96/polyglotr) does what you need without requiring an API key. Using your example:
polyglotr::create_translation_table(
translate,
languages = "en")
Upvotes: 1
Reputation: 2213
Here is another approach that can be considered :
library(RDCOMClient)
library(stringr)
IEApp <- COMCreate("InternetExplorer.Application")
IEApp[['Visible']] <- TRUE
text_To_Translate <- "La tutela de Vieux-la-Romaine est une "
text_To_Translate <- str_replace_all(string = text_To_Translate, pattern = "[:space:]", replacement = "%20")
url <- paste0('https://translate.google.com/?hl=fr&sl=fr&tl=en&text=', text_To_Translate, '&op=translate')
IEApp$Navigate(url)
Sys.sleep(10)
doc <- IEApp$document()
web_Obj <- doc$querySelector("#yDmH0d > c-wiz > div > div.ToWKne > c-wiz > div.OlSOob > c-wiz > div.ccvoYb > div.AxqVh > div.OPPzxe > c-wiz.sciAJc > div > div.usGWQd > div > div.lRu31 > span.HwtZe > span > span")
text <- web_Obj$outerHTML()
unlist(stringr::str_extract_all(text, ">.*</span>"))
[1] ">The tutela of Vieux-la-Romaine is a</span>"
Upvotes: 0
Reputation: 2213
Here is another approach with chatGPT with Azure of Microsoft :
library(reticulate)
conda_Env <- conda_list()
if(any(conda_Env[, 1] == "azureGPT") == FALSE)
{
reticulate::conda_create(envname = "azureGPT", packages = c("openai"), python_version = "3.9.16")
}
reticulate::use_condaenv(condaenv = "azureGPT")
openai <- import(module = "openai")
openai$api_type <- "azure"
openai$api_base <- "https://yyy.openai.azure.com/"
openai$api_version <- "2023-07-01-preview"
openai$api_key <- "xxx"
messages <- list(list(role = 'system',
content = 'You will me to translate from english to dutch.'),
list(role = 'user',
content = 'Translate from english to dutch the following sentence : This needs to be translated to Dutch'))
model <- openai$ChatCompletion
response <- model$create(engine = "GPT35",
messages = messages,
temperature = 0,
max_tokens = 350L,
top_p = 0.95,
frequency_penalty = 0,
presence_penalty = 0,
stop = NULL)
response$choices
{
"index": 0,
"finish_reason": "stop",
"message": {
"role": "assistant",
"content": "Dit moet vertaald worden naar het Nederlands."
},
"content_filter_results": {
"hate": {
"filtered": false,
"severity": "safe"
},
"self_harm": {
"filtered": false,
"severity": "safe"
},
"sexual": {
"filtered": false,
"severity": "safe"
},
"violence": {
"filtered": false,
"severity": "safe"
}
}
}
Upvotes: 0
Reputation: 2213
Here is an approach based on chatGPT which requires an API key :
library(chatgpt)
question <- "Can you translate the following sentence from English to Dutch : This also needs to be translated to Dutch"
Sys.setenv(OPENAI_API_KEY = "xxx")
chatgpt::reset_chat_session()
ask_chatgpt(question)
*** ChatGPT input:
Can you translate the following sentence from English to Dutch : This also needs to be translated to Dutch
[1] "Dit moet ook vertaald worden naar het Nederlands."
Upvotes: 0
Reputation: 2213
Here is an approach that can be used to translate from english to dutch :
library(reticulate)
conda_Env <- conda_list()
if(any(conda_Env[, 1] == "traduction") == FALSE)
{
reticulate::conda_create(envname = "traduction", packages = c("transformers", "SentencePiece"), python_version = "3.9.16")
reticulate::conda_install(envname = "traduction", packages = "torch", pip = TRUE)
}
reticulate::use_condaenv(condaenv = "traduction")
transformers <- import(module = "transformers")
tokenizer <- transformers$AutoTokenizer$from_pretrained("yhavinga/t5-small-24L-ccmatrix-multi")
model <- transformers$AutoModelForSeq2SeqLM$from_pretrained("yhavinga/t5-small-24L-ccmatrix-multi")
translator <- transformers$pipeline("translation_en_to_nl", tokenizer = tokenizer, model = model)
vec_Text <- c("This needs to be translated to Dutch", "This also needs to be translated to Dutch", "Just as this one has to")
translator(vec_Text)
[[1]]
[[1]]$translation_text
[1] "Dit moet vertaald worden naar het Nederlands"
[[2]]
[[2]]$translation_text
[1] "Dit moet ook vertaald worden naar het Nederlands"
[[3]]
[[3]]$translation_text
[1] "Net zoals deze moet"
With this approach, you do not need an app key and it runs locally on your computer.
Upvotes: 0
Reputation: 2213
Here is another approach based on google translate :
library(stringr)
library(pagedown)
library(pdftools)
text_To_Translate <- "La tutela de Vieux-la-Romaine est une "
text_To_Translate <- str_replace_all(string = text_To_Translate, pattern = "[:space:]", replacement = "%20")
url <- paste0('https://translate.google.com/?hl=fr&sl=fr&tl=en&text=', text_To_Translate, '&op=translate')
temp_PDF <- tempfile(fileext = ".pdf")
tryCatch(pagedown::chrome_print(input = url, output = temp_PDF, wait = 2), error = function(e) NA)
translated_Text <- pdf_text(temp_PDF)
translated_Text <- strsplit(translated_Text, split = "\r\n|\n")[[1]]
translated_Text <- translated_Text[c(12, 13)]
translated_Text[1] <- str_remove(string = translated_Text[1], pattern = " clear")
str_split(translated_Text, "[:space:]{20,100}")
[[1]]
[1] "La tutela de Vieux-la-" "The guardian of Vieux-la-"
[[2]]
[1] "Romaine est une" "Romaine is a"
Upvotes: 1
Reputation: 2213
Here is an approach that can be used which basically calls a python library from R :
library(reticulate)
conda_Env <- conda_list()
if(any(conda_Env[, 1] == "traduction") == FALSE)
{
reticulate::conda_create(envname = "traduction", packages = c("transformers"))
}
reticulate::use_condaenv("traduction")
py_run_string("from transformers import pipeline")
py_run_string("translator = pipeline('translation_en_to_fr')")
py_run_string("print(translator('It is easy to translate languages with transformers', max_length=40))")
[{'translation_text': "Il est facile de traduire des langues à l'aide de transformateurs"}]
This approach does not need an API and runs locally. You can also consider the following approach which also runs locally :
library(reticulate)
conda_Env <- conda_list()
if(any(conda_Env[, 1] == "traduction") == FALSE)
{
reticulate::conda_create(envname = "traduction", packages = c("transformers"))
}
reticulate::use_condaenv("traduction")
transformers <- import("transformers")
translator <- transformers$pipeline('translation_en_to_fr')
translator('It is easy to translate languages with transformers', max_length=40)
[[1]]
[[1]]$translation_text
[1] "Il est facile de traduire des langues à l'aide de transformateurs"
Upvotes: 2
Reputation: 51914
You can use the deeplr
package which uses deepl's API. Deepl is supposedly much more accurate than Google translate.
library(deeplr)
translate2(text = translate$sentences,
source_lang = "EN",
target_lang = "NL",
auth_key = "your_key")
#[1] "Dit moet vertaald worden naar het Nederlands"
#[2] "Dit moet ook vertaald worden naar het Nederlands"
#[3] "Net als deze moet"
Upvotes: 8