Reputation: 306
I am a beginner in R programming and a supposed to write a code to read in text from images! I am using the Tesseract and Magick packages for doing the same and am facing an issue where the code converts an "&" to "8:" I have attached the image that I am using as an input. Image used for processing
Below is the code that I am running
test2 <- image_read("C:/Users/admin/Desktop/testimage.jpg") %>%
image_resize("2000") %>%
image_convert(colorspace = 'gray') %>%
image_trim() %>%
image_ocr()
cat(test2)
write.table(test2, "C:/Users/admin/Desktop/output2.txt", sep="\t")
Below is the output that I am getting
No relation between boycotting
panchayat polls 8: Article 35A:
Subramanian Swamy
I have referred to the following source to gain some understanding but did not find any suitable solution for this specific problem.
I have also gone through this website but did not find much help in reading in special characters.
If someone can help me, that would be really helpful.
Upvotes: 6
Views: 342
Reputation: 21
Can you use Imagemagick with a TIF instead of a JPG to do the same ? I used the below query and it worked.
test20 <- image_read("E:/xx/image.tif") %>%
image_resize("4000") %>%
image_convert(colorspace = 'gray') %>%
image_trim() %>%
image_ocr()
cat(test20)
write.table(test2, "E:/xx/output.txt", sep="\t")
Upvotes: 1