Edem Devlet
Edem Devlet

Reputation: 71

How to read/rewrite .doc file as xml in nodeJS?

I need to read .doc files, change some properties, and save it. How I can do this?

I can read .docx files, like this:

const zip = new AdmZip(filePath);
const xml = zip.readAsText('word/document.xml');
console.log(xml)

//<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
//<w:document //xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessi//ngCanvas" //xmlns:cx="http://schemas.microsoft.com/office/drawing/2014/chartex" //xmlns:cx1="http://schemas.microsoft.com/office/drawing/2015/9/8/chart//ex...

I try read .doc like this:

const expectedXml = fs.readFileSync(filePath);

but I get indiscriminate result.

I expect to get xml like in the example with .docx.

Upvotes: 2

Views: 3903

Answers (1)

kjhughes
kjhughes

Reputation: 111696

Microsoft DOC files predate DOCX and are not based in zipped (OPC) XML (OOXML); they're a binary file format.

For one-offs, open the DOC file in MS Word or LibreOffice and re-save as a DOCX.

To extract the text programmatically in NodeJS, use a package such as textract.

Upvotes: 1

Related Questions