Reputation: 35
Its been so much of time exploring the Google vision API, I am trying to get the Vision API Response in English Language only , below is my request object to API which has language hints :
{
"requests": [
{
"features": [
{
"type": "IMAGE_PROPERTIES"
},
{
"type": "LANDMARK_DETECTION"
},
{
"type": "LABEL_DETECTION"
},
{
"type": "WEB_DETECTION"
},
{
"type": "FACE_DETECTION"
},
{
"type": "SAFE_SEARCH_DETECTION"
},
{
"type": "TEXT_DETECTION"
},
{
"type": "LOGO_DETECTION"
}
],
"image": {
"source": {
"imageUri": "https://images.dreamstream.com/prodds/prddsimg/OM_pasteIt22_12_2017_2_34_7806303.jpeg"
}
},
"imageContext": {
"languageHints": [
"en"
]
}
}
]
}
Even this request object not getting correct response(multiple languages) from Vision API ..
if there is any steps is there to get response in English only please let me know, as of now response contains multiple languages like below :
{
"url": "https://www.tummyummi.com/food/menu-aryaas-restaurant",
"pageTitle": "Aryaas India Restaurant - مطعم ارياس لبهند - TummYummi Restaurants",
"fullMatchingImages": [
{
"url": "https://www.tummyummi.com/food/upload/1509868727-Curd-Vada.jpg"
}
]
},
Upvotes: 0
Views: 890
Reputation: 35
Thanks for the useful answer @dustinroepsch, rather than relying on cloud translation api , we can go for regex because the only feature which is having non-english texts is WEB_DETECTION , sometimes it may vary.
In WEB_DETECTION , few objects like pagesWithMatchingImages and webEntities may have non-english texts . After Parsing JSON , we can use following regex pattern to remove non-english texts.
String regex = "[a-z,A-Z,0-9,($&+,:;=?@#|'<>.^*()%!-)\\s]";
Upvotes: 0
Reputation: 1160
If I'm understanding correctly, the Vision Api is looking at your image, and determined that it has seen a similar image at https://www.tummyummi.com/food/menu-aryaas-restaurant.
The title of this website is Aryaas India Restaurant - مطعم ارياس لبهند - TummYummi Restaurants
.
It is not a bug that this non-english text is being sent to you, because you asked the Api to use WEB_DETECTION
.
It found a website that has that image, and gave you a link to it and its title.
From the docs, the ImageContext
parameter languageHints
allows you to set the expected language for text in the image, and will return an error if any other language is detected:
Text detection returns an error if one or more of the specified languages is not one of the supported languages.
It's important to note that this language setting is only affecting text detection.
If you want the text detection to only return english elements, but not error out if it detects anything else, then that document recommends the following:
For languages based on the Latin alphabet, setting languageHints is not needed. In rare cases, when the language of the text in the image is known, setting a hint will help get better results (although it will be a significant hindrance if the hint is wrong)
Instead, to filter out any text that is not english, you would instead look at the TextAnnotation
's locale
field, and filter out anything that isn't en
on the client side.
As far as detecting the language of the title of the website during WEB_DETECTION
is concerned, I think that is out of scope of the Google vision api, but you could try using the detecting lanuages feature of the cloud translation api.
Upvotes: 1