Reputation: 127
I have some sample images. How to extract tabular data from images and store it into JSON format?
Upvotes: 0
Views: 1037
Reputation: 1306
Use pytesseract. The code will be something like this. You can try different modifications . My code may not solve the whole problem .It is just an example code ,this will work for text in black but for blue and any other colour you will have to create a mask accordingly and then extract that data.
import pytesseract
from PIL import Image, ImageEnhance, ImageFilter
im = Image.open("temp.jpg")
maxsize = (2024, 2024)
im=im.thumbnail(maxsize, PIL.Image.ANTIALIAS)
im = im.filter(ImageFilter.MedianFilter())
enhancer = ImageEnhance.Contrast(im)
im = enhancer.enhance(2)
im = im.convert('1')
im.save('mod_file.jpg')
text = pytesseract.image_to_string(Image.open('mod_file.jpg'))
print(text)
For example for red colour detection you can refer to this post. After getting the red text binarize the image and then run
text = pytesseract.image_to_string(Image.open('red_text_file.jpg'))
Similerly you will have to do the same process for blue and so on. I believe you can easily try to do it yorself, just play around with some values.
Upvotes: 1