How to convert a human-readable timeline to table using existing ML tools?

Question

I have this timeline from a newspaper produced by my Native American tribe. I was trying to use AWS Textract to produce some kind of table from this. AWS Textract does not recognize any tables in this. So I don't think that will work (perhaps more can happen there if I pay, but it doesn't say so).

Ultimately, I am trying to sift through all the archived newspapers and download all the timelines for all of our election cycles (both "general" and "special advisory") to find number of days between each item in timeline.

Since this is all in the public domain, I see no reason I can't paste a picture of the table here. I will include the download URL for the document as well.

Download URL: Download

I started off by using Foxit Reader on individual documents to find the timelines on Windows.

Then I used a tool 'ocrmypdf' on ubuntu to ensure all these documents are searchable (ocrmypdf --skip-text Notice_of_Special_Election_2023.pdf.pdf ./output/Notice_of_Special_Election_2023.pdf).

Then I just so happened to see an ad for AWS Textract this morning in my Google Newsfeed. Saw how powerful it is. But when I tried it, it didn't actually find these human-readable timelines.

I'm hopefully wondering if any ML tools or even other solutions exist for this type of problem.

I am namely trying to keep my tech knack up to par. I was sick the last two years and this is a fun problem to tackle that I think is pretty fringe.

How to convert a human-readable timeline to table using existing ML tools?

Answers (1)

Related Questions