Reputation: 1464
When trying out google cloud vision with the drag and drop Try Drag and Drop, the last tab has raw JSON. What parameter do we need to pass to get that data?
I'm currently doing DOCUMENT_TEXT_DETECTION but it only gives data at the level of words and not of individual characters.
Edit: I modified this code vision test and changed the feature ...
feature := &vision.Feature{
Type: "DOCUMENT_TEXT_DETECTION",
}
and the printing to ...
body, err := json.Marshal(res)
fmt.Println(string(body))
I'm only seeing textAnnotations in the output.
Upvotes: 1
Views: 4348
Reputation: 177
The JSON file contains different things like text, locations and etc etc, Your concern is about getting full text. Here I am adding a Python code, it says that you can get the full text by rendering the JSON file, you will find your required result using data['fullTextAnnotation']['text'], and you can get characters by breaking this file into smaller chunks and I guess JSON file has individual characters in it but I have never worked on it.
import json
from pprint import pprint
data = json.load(open('File Path'))
pprint(data['fullTextAnnotation']['text'])
Upvotes: 2
Reputation: 1098
Using the same code template you are using in Go language: Search “type Feature struct” in the browser in this page. You can see the following feature types and descriptions:
// Type: The feature type.
//
// Possible values:
// "TYPE_UNSPECIFIED" - Unspecified feature type.
// "FACE_DETECTION" - Run face detection.
// "LANDMARK_DETECTION" - Run landmark detection.
// "LOGO_DETECTION" - Run logo detection.
// "LABEL_DETECTION" - Run label detection.
// "TEXT_DETECTION" - Run text detection / optical character
// recognition (OCR). Text detection
// is optimized for areas of text within a larger image; if the image
// is
// a document, use `DOCUMENT_TEXT_DETECTION` instead.
// "DOCUMENT_TEXT_DETECTION" - Run dense text document OCR. Takes
// precedence when both
// `DOCUMENT_TEXT_DETECTION` and `TEXT_DETECTION` are present.
// "SAFE_SEARCH_DETECTION" - Run Safe Search to detect potentially
// unsafe
// or undesirable content.
// "IMAGE_PROPERTIES" - Compute a set of image properties, such as
// the
// image's dominant colors.
// "CROP_HINTS" - Run crop hints.
// "WEB_DETECTION" - Run web detection.
There is not an option to directly show the JSON tab contents. The JSON tab contents are the addition of all the tabs “output”. Users tend to ask just for one. For example, when someone is analyzing faces is not interested in text detection.
If you need more than one, you can obtain multiple features outputs by “adding” the result of all the possible values together. Based on the facts mentioned, I have added the following lines to your code:
feature2 := &vision.Feature{
Type: "LABEL_DETECTION",
MaxResults: 10,
}
req2 := &vision.AnnotateImageRequest{
Image: img,
Features: []*vision.Feature{feature2},
}
batch2 := &vision.BatchAnnotateImagesRequest{
Requests: []*vision.AnnotateImageRequest{req2},
}
res2, err := svc.Images.Annotate(batch2).Do()
if err != nil {
log.Fatal(err)
}
body2, err := json.Marshal(res2)
fmt.Println(string(body2))
I have tested it and works. You should add this block of code for all the features in which you are interested. If you intend to add many of them, I would suggest to create a function/loop to avoid repeating code.
Anyway, I suggest you to fulfill the request here in order to exactly obtain the JSON output (that gives data at the level of words or letters) through calling the API instead of using a client library. I have used the next code to obtain the bounding box for the numbers of my interest:
{
"requests":
[
{
"features":
[
{
"type":
""
"maxResults":
-- add a property --model
}
{
"type":
""
-- add a property --maxResultsmodel
}
]
"image":
{
"source":
{
"gcsImageUri":
""
-- add a property --imageUri
}
-- add a property --content
}
-- add a property --imageContext
}
]
-- add a property --
}
Upvotes: 0
Reputation: 175
Well, if you check properly there are various things available in that last tab containing raw JSON. Based on your requirements you can fetch any of them.
From the response that you get from DOCUMENT_TEXT_DETECTION, you can fetch text_annotations, full_text_annotations, etc.
From text_annotations, you can fetch description, language of entire text, each words of texts, numeric digits, special characters and their respective co-ordinates.
From full_text_annotations, you can fetch pages, blocks of data, paragraphs, and individual characters, with their respective co-ordinates and confidence score.
Upvotes: 0