Reputation: 71
I need to read .doc files, change some properties, and save it. How I can do this?
I can read .docx files, like this:
const zip = new AdmZip(filePath);
const xml = zip.readAsText('word/document.xml');
console.log(xml)
//<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
//<w:document //xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessi//ngCanvas" //xmlns:cx="http://schemas.microsoft.com/office/drawing/2014/chartex" //xmlns:cx1="http://schemas.microsoft.com/office/drawing/2015/9/8/chart//ex...
I try read .doc like this:
const expectedXml = fs.readFileSync(filePath);
but I get indiscriminate result.
I expect to get xml like in the example with .docx.
Upvotes: 2
Views: 3903
Reputation: 111696
Microsoft DOC files predate DOCX and are not based in zipped (OPC) XML (OOXML); they're a binary file format.
For one-offs, open the DOC file in MS Word or LibreOffice and re-save as a DOCX.
To extract the text programmatically in NodeJS, use a package such as textract.
Upvotes: 1