Reputation: 23
I am wondering how I would write a Python script to carry out the following set of steps: (1) open a typical .docx, (2) select all, (3) copy to clipboard, (4) store as a string.
I don't care about preserving any formatting, nor about graphics, nor about tables. I just want the text stored as a gigantic string, for parsing and analysis.
Upvotes: 1
Views: 3858
Reputation: 4488
Since you are talking about a docx you could consider using python-docx https://python-docx.readthedocs.io/en/latest/
According to the documentation you could write something like this
def getText(filename):
doc = docx.Document(filename)
fullText = []
for para in doc.paragraphs:
fullText.append(para.text)
return '\n'.join(fullText)
To get all the text then using something like pyperclip
you could copy it to clipboard.
So without trying it i would imagine something like
import docx
import pyperclip
textInFile = getText("yourDoc.docx")
pyperclip.copy(textInFile)
https://github.com/asweigart/pyperclip
Upvotes: 1
Reputation: 328
There are libraries to help with this. Take a look at python-docx
, which despite being oriented towards creating and updating docx
files will allow you to read the contents of a document.
This answer HERE might help you start, but is by no means complete.
Here's a link to the python-docx documentation.
Upvotes: 0