user17285516
user17285516

Reputation:

python: find numbers in docx file and replace

I want to read the docx file in python. then extract numbers from that like:

with open('test.docx') as t:
    text = t.readlines()
a = []
a.append([int(s) for s in text.split() if s.isdigit()])
a = [int(numeric_string) for numeric_string in a]

Thanks for any bits of help

Upvotes: 0

Views: 533

Answers (1)

ljdyer
ljdyer

Reputation: 2086

You can use the docx library to read the content of .docx files.

pip install python-docx

Adapting some code from here and combining with the code you posted I got:

import docx

def getText(filename):
    doc = docx.Document(filename)
    fullText = []
    for para in doc.paragraphs:
        fullText.append(para.text)
    return '\n'.join(fullText)

text = getText('Doc1.docx')

a = [int(s) for s in text.split() if s.isdigit()]

which worked for me with a simple test file - although you may need to adjust some parts depending on how you want the search for numbers to work.

Upvotes: 1

Related Questions