You've tagged this as a Python question, but rather than provide code I'll give you a few pointers to start you on your way. If you need help choosing an OCR library and using that library, you'll need to ask additional questions. Your question suggests you're looking for an application approach (okay for StackOverflow) rather than an exhaustive answer along with code.
First, I'd suggest looking for the simplest solution. Maybe that's all you need! If you're not asked to do more than separate the questions, find the most straightforward technique that will do the job (and that you can tolerate debugging).
A few observations:
- Each question with question number, question text, and answers occupies a rectangle of space on the page
- Each question starts with a question number followed by "." and a few spaces and then a word. For example, "58. Which of the..."
- The question numbers are in numeric sequence: 57, 58, 59, 60
- The numbers are found vertically aligned: 57 is located over 58, and 59 is over 60, with little variation in the X-axis.
- The question numbers are in bold
- The page is reasonably well aligned with the edges of your image.
- There are other numbers in the image, but they don't obey the same rules above.
- There's a nice stretch of vertical white space between the two columns.
To keep the problem simple, consider ways you could find these rectangles:
![four red rectangles, each rectangle surrounding the question number and text](https://i.sstatic.net/tSxmK.png)
More briefly:
- Use OCR to read the whole page.
- Your OCR results will include text and bounding boxes for the text
- Search for text fitting the pattern "[number]. [text]" You can use a regular expression to match this pattern.
- As needed, check for your text to meet other criteria for question numbers as described above.
- Time for debug! Check that your code has reduced the OCR results down to the text and locations for just the four question numbers 57, 58, 59, 60.
- Order your text locations counter-clockwise, starting with bottom right: 60, 59, 58, 57. Use the simplest [convex hull] approach3.
- Start with the bottom rightmost result (meaning 60), which is the first element in your ordered list of numbers.
- From the (x,y) point location of the top left of the text "60. ", define a rectangle all the way to the bottom right of the image.
- Choose your next point location, which in your ordered list will correspond to the text "59. "
- From the text location for "59" extend a rectangle down and to the right as before. But crop your rectangle so that it doesn't overlap the existing rectangle. (For functions to ensure rectangles don't overlap, look into "union" and "intersection" functions.)
- For your next item, which happens to be "57" in the counter-clockwise order, extend the rectangle down and to the right. As before, crop so that it doesn't intersect other rectangles. For the moment, this rectangle overlaps the bottom leftmost question, "58."
- For your last item, "58," find the rectangle that extends down and to the right, and that does not intersect any other rectangles.
- If you want to separate the two bottom rectangles (for 58 and 60) from the bottom of the page, you can look for multiple horizontal rows of pixels that are largely white. A number of these rows separate the bottommost questions from the page number and other text.
- and so on
You may find a simpler technique than this, but at the very least this should give you some ideas about solving the problem with OCR, regular expressions, and some geometry.
Good luck!
Finding the four question numbers and first word of text:
![each of four question numbers and first word of text found](https://i.sstatic.net/B4dpa.png)
The bottom right rectangle:
![bottom right rectangular region highlighted](https://i.sstatic.net/EC045.png)
The top right rectangle:
![top right rectangular region highlighted](https://i.sstatic.net/ZNRO2.png)
Left rectangle, which initially overlaps the bottom leftmost question:
![top left rectangular region, which initially stretches over the bottom left question](https://i.sstatic.net/fQnHG.png)
Bottom left rectangle
![bottom left rectangle](https://i.sstatic.net/lGiaO.png)