zerohedge
zerohedge

Reputation: 3725

Python & MS Word: Convert .doc to .docx?

I found several questions that were similar to mine, but none of the answers came close to what I need.

Specifications: I'm working with Python 3 and do not have MS Word. My programming machine is running OS X and cloud machine is linux/ubuntu too.

I'm using python-docx to extract values from a .doc file that is sent to me nightly. However, python-docx only works with .docx files, so I need to convert the file to that extension first.

So, I've got a .doc file that I need to convert to .docx. This script might have to run in the cloud so I can't install any kind of Office or Office-like software. Can this be done?

Upvotes: 14

Views: 19789

Answers (5)

DrIDK
DrIDK

Reputation: 7944

Use Apache Tika and pandoc

java -jar tika-app-3.0.0.jar -T test.doc | pandoc -o test.docx

Upvotes: 0

thrinadhn
thrinadhn

Reputation: 2503

You are working with Linux/ubuntu, you can use LibreOffice’s inbuilt converter.

SYNTAX

lowriter --convert-to docx *.doc

#Example #

lowriter --convert-to docx testdoc.doc

This will convert all doc files to docx and save in the same folder itself. This will convert all .doc files to .docx and save them in the same folder. Currently functioning on Ubuntu.

Upvotes: 16

TriveniReddy
TriveniReddy

Reputation: 11

import aspose.words as aw
path1="doc file path"
path2="path to save converted file"
file2=file.rsplit('.',1)[0]+'.docx'
filename1=os.path.join(path2,file2)
filename=os.path.join(path1,file)
doc = aw.Document(filename)
doc.save(filename1)

Upvotes: 0

Tilal Ahmad
Tilal Ahmad

Reputation: 939

Aspose.Words Cloud SDK for Python can convert DOC to DOCX. The package can open, generate, edit, split, merge, compare and convert a Word document in Python on any platform without depending on MS Word.

It is a paid product, but the free plan provides 150 free monthly API calls.

P.S: I'm developer evangelist at Aspose.

# Import module
import asposewordscloud
import asposewordscloud.models.requests
from shutil import copyfile

# Get your credentials from https://dashboard.aspose.cloud (free registration is required).
words_api = asposewordscloud.WordsApi(app_sid='xxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx',app_key='xxxxxxxxxxxxxxxxxxxxxxxxx')
words_api.api_client.configuration.host = 'https://api.aspose.cloud'

filename = 'C:/Temp/02_pages.doc'
dest_name = 'C:/Temp/02_pages.docx'
#Convert RTF to text
request = asposewordscloud.models.requests.ConvertDocumentRequest(document=open(filename, 'rb'), format='docx')
result = words_api.convert_document(request)
copyfile(result, dest_name)

Upvotes: 0

pablissimo77
pablissimo77

Reputation: 39

You could use unoconv - Universal Office Converter. Convert between any document format supported by LibreOffice/OpenOffice.

unoconv -d document --format=docx *.doc
subprocess.call(['unoconv', '-d', 'document', '--format=docx', filename])

Upvotes: 3

Related Questions