Extract / scrap data from PDF with python

Question

I am trying to find a solution to automate a task. In effect, I have a PDF file that I get from a website:

For example the following PDF: https://www.dgssi.gov.ma/sites/default/files/vulnerabilites_affectant_plusieurs_produits_de_cisco_13.pdf

I want to collect the information from the file in the form of a Python Dictionary {'bold sentence': 'the sentences after the bold sentence'}

Example: {....... , 'Solution': 'Veuillez se référer aux bulletins de sécurité de Cisco pour mettre à jours vos équipements', .....}

I already tried to transform the PDF to HTML and do some web scraping but there is no way to make the difference between several HTML tags because all the tags are similar.

If you can propose to me a solution or a code to make the extraction in the form of a dictionary I will be very grateful.

Any help would be appreciated, and if I need to be more specific let me know.

Extract / scrap data from PDF with python

Answers (1)

EDIT - adding another approach

Converting to DOC

Related Questions