Saurabh Kumar
Saurabh Kumar

Reputation: 127

How to extract tabular data from images?

I have some sample images. How to extract tabular data from images and store it into JSON format?

image 1

Upvotes: 0

Views: 1037

Answers (1)

Andy_101
Andy_101

Reputation: 1306

Use pytesseract. The code will be something like this. You can try different modifications . My code may not solve the whole problem .It is just an example code ,this will work for text in black but for blue and any other colour you will have to create a mask accordingly and then extract that data.

import pytesseract
from PIL import Image, ImageEnhance, ImageFilter

im = Image.open("temp.jpg")

maxsize = (2024, 2024)
im=im.thumbnail(maxsize, PIL.Image.ANTIALIAS) 

im = im.filter(ImageFilter.MedianFilter())
enhancer = ImageEnhance.Contrast(im)

im = enhancer.enhance(2)
im = im.convert('1')

im.save('mod_file.jpg')
text = pytesseract.image_to_string(Image.open('mod_file.jpg'))
print(text)

For example for red colour detection you can refer to this post. After getting the red text binarize the image and then run

text = pytesseract.image_to_string(Image.open('red_text_file.jpg'))

Similerly you will have to do the same process for blue and so on. I believe you can easily try to do it yorself, just play around with some values.

Upvotes: 1

Related Questions