Is there Python module I can use to correct words that have random spaces in?

Question

I'm analysing a pdf and for some reason many of the words have random spaces in or none between after I move it to python. I'm using PdfReader from PyPDF2.

Examples: Y ou’re sweet, but I feel fine. I wish I feltas calmas you look.

The strange thing is, the spaces aren't present (or not present) in the pdf, but only after I collect it in python.

So my proposed solution is a grammar or spellchecking module that will look at some text like 'y ou' and make it 'you' (and 'asif' to 'as if'). It would be great if there were a way to only enable that spellchecking feature, because I don't want it to change other things in the pdf.

I welcome any other solutions (perhaps in the way I'm collecting the pdf).

My current code looks like this:

def all_pages1(num, start, stop):
    global file
    with open(f'example{num}.txt', 'w') as file:
        path = "C:/example.pdf"
        with open(path, mode = 'rb') as file2:
            reader = PdfReader(file2)
            for page in range(start, stop):
                page1 = reader.pages[page]
                text = page1.extractText()
                main(num, text)
        file2.close()
    file.close()
    pass

main() does the actual searching that isn't relevant to my problem.

Is there Python module I can use to correct words that have random spaces in?

Answers (1)

Related Questions