unique_beast
unique_beast

Reputation: 1470

Grouping text extracted as full words from the Google Vision API

I am trying to reproduce the output of the "Document Text Detection" sample UI uploader through the Google Vision API. However, the output I am getting from the sample code is only providing individual characters as an output, when I require words to be grouped together.

Is there a feature within the library that allows grouping by "words" instead from the DOCUMENT_TEXT_DETECT endpoint or or image.detect_full_text() function in Python?

I am not looking for full text extraction as my .jpg files are not visually structured in a way that the image.detect_text() function satisfies.

Google's Sample Code:

def detect_document(path):
    """Detects document features in an image."""
    vision_client = vision.Client()

    with io.open(path, 'rb') as image_file:
        content = image_file.read()

    image = vision_client.image(content=content)

    document = image.detect_full_text()

    for page in document.pages:
        for block in page.blocks:
            block_words = []
            for paragraph in block.paragraphs:
                block_words.extend(paragraph.words)

            block_symbols = []
            for word in block_words:
                block_symbols.extend(word.symbols)

            block_text = ''
            for symbol in block_symbols:
                block_text = block_text + symbol.text

            print('Block Content: {}'.format(block_text))
            print('Block Bounds:\n {}'.format(block.bounding_box))

Sample output of the off the shelf sample provided by Google:

property {
  detected_languages {
    language_code: "mt"
  }
}
bounding_box {
  vertices {
    x: 1193
    y: 1664
  }
  vertices {
    x: 1206
    y: 1664
  }
  vertices {
    x: 1206
    y: 1673
  }
  vertices {
    x: 1193
    y: 1673
  }
}
symbols {
  property {
    detected_languages {
      language_code: "en"
    }
  }
  bounding_box {
    vertices {
      x: 1193
      y: 1664
    }
    vertices {
      x: 1198
      y: 1664
    }
    vertices {
      x: 1198
      y: 1673
    }
    vertices {
      x: 1193
      y: 1673
    }
  }
  text: "P"
}
symbols {
  property {
    detected_languages {
      language_code: "en"
    }
    detected_break {
      type: LINE_BREAK
    }
  }
  bounding_box {
    vertices {
      x: 1200
      y: 1664
    }
    vertices {
      x: 1206
      y: 1664
    }
    vertices {
      x: 1206
      y: 1673
    }
    vertices {
      x: 1200
      y: 1673
    }
  }
  text: "M"
}


block_words
Out[47]: 
[property {
   detected_languages {
     language_code: "en"
   }
 }
 bounding_box {
   vertices {
     x: 1166
     y: 1664
   }
   vertices {
     x: 1168
     y: 1664
   }
   vertices {
     x: 1168
     y: 1673
   }
   vertices {
     x: 1166
     y: 1673
   }
 }
 symbols {
   property {
     detected_languages {
       language_code: "en"
     }
   }
   bounding_box {
     vertices {
       x: 1166
       y: 1664
     }
     vertices {
       x: 1168
       y: 1664
     }
     vertices {
       x: 1168
       y: 1673
     }
     vertices {
       x: 1166
       y: 1673
     }
   }
   text: "2"
 }

Upvotes: 3

Views: 2611

Answers (2)

Mohammed Jamali
Mohammed Jamali

Reputation: 175

There are two types in GCV: 1. Text Detection and 2. Document Text Detection

Text detection is used for detecting some text in an image. Basically it gives text values which are found in it. You cannot rely on its accuracy, for example this cannot be used to read receipts or any document data.

Whereas, document text detection is very good in accuracy and detects each minute detail from the document. In this method, words are separated from each other, for e.g. 03/12/2017 will come as 0 3 / 1 2 / etc. along with its co-ordinates. This is actually for better accuracy.

Now as per your question, you should better use first method i.e. text detection and it will provide you results with full words and its co-ordinates.

Upvotes: 0

ExtractTable.com
ExtractTable.com

Reputation: 811

This response is coming late. I guess you were looking for something like the below.

def parse_image(image_path=None):
    """
    Parse the image using Google Cloud Vision API, Detects "document" features in an image
    :param image_path: path of the image
    :return: text content
    :rtype: str
    """

    client = vision.ImageAnnotatorClient()
    response = client.text_detection(image=open(image_path, 'rb'))
    text = response.text_annotations
    del response

    return text[0].description

the function returns complete text in the image.

Upvotes: 1

Related Questions