Yash Arora
Yash Arora

Reputation: 88

Extracting text out of images

I am working on extracting text out of images.

Initially images are colored with text placed in white, On further processing the images, the text is shown in black and other pixels are white (with some noise), here is a sample:

Now when I try OCR using pytesseract (tesseract) on it, I still am not getting any text.

Is any solution possible to extract text from colored images?

Upvotes: 6

Views: 13536

Answers (2)

Ankit Kumar Rajpoot
Ankit Kumar Rajpoot

Reputation: 5590

Try this one -

import os
from PIL import Image
import cv2
import pytesseract
import ftfy
import uuid

filename = 'uTGi5.png'
image = cv2.imread(os.path.join(filename))
gray = cv2.threshold(image, 200, 255, cv2.THRESH_BINARY)[1]
gray = cv2.resize(gray, (0, 0), fx=3, fy=3)
gray = cv2.medianBlur(gray, 9)
filename = str(uuid.uuid4())+".jpg"
cv2.imwrite(os.path.join(
    filename), gray)
config = ("-l eng --oem 3 --psm 11")
text = pytesseract.image_to_string(Image.open(os.path.join(
    filename)), config=config)
text = ftfy.fix_text(text)
text = ftfy.fix_encoding(text)
text = text.replace('-\n', '')
print(text)

Upvotes: 0

Deepan Raj
Deepan Raj

Reputation: 395

from PIL import Image
import pytesseract
import argparse
import cv2

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True, help="Path to the image")
args = vars(ap.parse_args())

# load the image and convert it to grayscale
image = cv2.imread(args["image"])
cv2.imshow("Original", image)

# Apply an "average" blur to the image

blurred = cv2.blur(image, (3,3))
cv2.imshow("Blurred_image", blurred)
img = Image.fromarray(blurred)
text = pytesseract.image_to_string(img, lang='eng')
print (text)
cv2.waitKey(0)

As as result i get = "Stay: in an Overwoter Bungalow $3»"

What about using Contour and taking unnecessary blobs from it ? might work

Upvotes: 6

Related Questions