Reputation: 111
So, I am fairly new to tesseract and some people had similar problems as I have on this very forum but I could not get a satisfying solution, hence I am posting this question.
I have pictures from a street camera and I want to get the time stamps of the footage. After cutting out the time stamps they look like this:
I approach this problem by using tesseract with R:
library(tesseract)
library(magick)
eng <- tesseract("eng")
input <- image_read("image from above")
Using basic tesseract I get:
input %>% tesseract::ocr(,engine = eng)
# [1] "SRE SAA PRO 206197180731 17:33:88\n"
Obviously, this doesn't help much. Therefore, after reading up on the issue I tried this:
text <- input %>%
image_resize("2000x") %>%
image_convert(type = 'Grayscale') %>%
image_trim(fuzz = 40) %>%
image_write(format = 'png', density = '300x300') %>%
tesseract::ocr()
cat(text)
# es bt i deen | ee) eee i ae 2s ee ee ee eee ec ee |
This result is even worse, which is really frustrating. How do I get a correct result? Any help is warmly welcome :)
EDIT
@Max Teflon answered the question for this example. However, I realised that some images are still read wrongly such as
Can anyone further improve his solution?
Upvotes: 5
Views: 709
Reputation: 1800
What a nice problem! It was really fun to play around with. I found this solution to work for your example:
library(tesseract)
library(magick)
eng <- tesseract("eng")
input <- image_read("https://i.sstatic.net/0QzhP.jpg") %>%
.[[1]] %>%
as.numeric() # cause numerics are just easier to work with
image_read(ifelse(input <.9, 1, 0) ) # changing every non-white pixel to white and every white pixel to black
So far so good, here is the black-and-white-version:
Just trying to ocr this one did not quite work, so i tried changing the size of it:
image_read(ifelse(input <.9, 1, 0) ) %>%
image_resize('500x') %>% # make it smaller to work around the errors
tesseract::ocr()
#> [1] "TLC200 PRO 2019/10/31 17:33:00\n"
The resizing and the contrast-parts are just the results of playing around. You might want to change it if the solution doesn't work as good on the rest of your pictures.
Created on 2020-01-15 by the reprex package (v0.3.0)
Upvotes: 3