Reputation: 2262
I'm exporting a database query as Excel and I am getting rows with RTF formatting.
How can I convert these fields into plain text? I've found answers that are pretty old, so I was wondering if anyone knows a way.
Upvotes: 5
Views: 16183
Reputation: 2262
I'm revisiting this question to provide 2 javascript solutions, rather than a .NET
one.
const parseRTF = require("rtf-parser");
let rtf = `{\\rtf1\\ansi\\deff0\\nouicompat{\\fonttbl{\\f0\\fnil\\fcharset0 Calibri;}{\\f1\\fnil\\fcharset204 Calibri;}{\\f2\\fnil Calibri;}} {\\colortbl ;\\red0\\green0\\blue0;} {\\*\\generator Riched20 10.0.19041}\\viewkind4\\uc1 \\pard\\cf1\\f0\\fs18\\lang1033 WEB1616 \\f1\\lang1071\\'ef\\'eb\\'e0\\'f2\\'e5\\'ed\\'ee \\'f1\\'ee \\'ea\\'e0\\'f0\\'f2\\'e8\\'f7\\'ea\\'e0\\par \\'ca\\'f0\\'e8\\'f1\\'f2\\'e8\\'ed\\'e0 \\'c3\\'ee\\'eb\\'e0\\'e1\\'ee\\'f1\\'ea\\'e0 077640615\\par \\'c2\\'e0\\'f0\\'f8\\'e0\\'e2\\'f1\\'ea\\'e0 6\\'e0\\par 1000 \\'d1\\'ea\\'ee\\'ef\\'bc\\'e5\\f2\\lang1033\\par } `;
function convertRTFtoPlainText(rtf) {
return new Promise((resolve, reject) => {
parseRTF.string(rtf, (err, doc) => {
if (err) {
reject(err);
}
let string = "";
doc.content.forEach((item) => {
if (item.content) {
item.content.forEach((span) => {
string += span.value;
});
} else {
string += item.value;
}
});
resolve(string.trim());
});
});
}
(async () => {
let value = await convertRTFtoPlainText(rtf);
console.log(value);
})();
const jsdom = require("jsdom");
const { JSDOM } = jsdom;
function stringToArrayBuffer(string) {
if (string == null) return;
let buffer = new ArrayBuffer(string.length);
let bufferView = new Uint8Array(buffer);
for (let i = 0; i < string.length; i++) {
bufferView[i] = string.charCodeAt(i);
}
return buffer;
}
// callback = function to run after the DOM has rendered, defined when calling runRtfjs
function runRtfjs(rtf, callback, errorCallback) {
const virtualConsole = new jsdom.VirtualConsole();
virtualConsole.sendTo(console);
let dom = new JSDOM(
`
<script src="./node_modules/rtf.js/dist/RTFJS.bundle.js"></script>
<script>
RTFJS.loggingEnabled(false);
try {
const doc = new RTFJS.Document(rtfFile);
const meta = doc.metadata();
doc
.render()
.then(function(htmlElements) {
const div = document.createElement("div");
div.append(...htmlElements);
// window.done(meta, div.innerHTML);
// window.done(meta, div.innerText);
window.done(meta, div.textContent); // pass the data to the callback
}).catch(error => window.onerror(error))
} catch (error){
window.onerror(error)
}
</script>
`,
{
resources: "usable",
runScripts: "dangerously",
url: "file://" + __dirname + "/",
virtualConsole,
beforeParse(window) {
window.rtfFile = stringToArrayBuffer(rtf);
window.done = function (meta, html) {
callback(meta, html); // call the callback
};
window.onerror = function (error) {
errorCallback(error);
};
},
}
);
}
let rtf = `{\\rtf1\\ansi\\deff0\\nouicompat{\\fonttbl{\\f0\\fnil\\fcharset0 Calibri;}{\\f1\\fnil\\fcharset204 Calibri;}{\\f2\\fnil Calibri;}} {\\colortbl ;\\red0\\green0\\blue0;} {\\*\\generator Riched20 10.0.19041}\\viewkind4\\uc1 \\pard\\cf1\\f0\\fs18\\lang1033 WEB1616 \\f1\\lang1071\\'ef\\'eb\\'e0\\'f2\\'e5\\'ed\\'ee \\'f1\\'ee \\'ea\\'e0\\'f0\\'f2\\'e8\\'f7\\'ea\\'e0\\par \\'ca\\'f0\\'e8\\'f1\\'f2\\'e8\\'ed\\'e0 \\'c3\\'ee\\'eb\\'e0\\'e1\\'ee\\'f1\\'ea\\'e0 077640615\\par \\'c2\\'e0\\'f0\\'f8\\'e0\\'e2\\'f1\\'ea\\'e0 6\\'e0\\par 1000 \\'d1\\'ea\\'ee\\'ef\\'bc\\'e5\\f2\\lang1033\\par } `;
runRtfjs(
rtf,
(meta, html) => {
console.log(html);
},
(error) => console.error(error)
);
Upvotes: 0
Reputation: 41
The .Net Framework RichTextBox class can perform the conversion. Fortunately, this class has the ComVisibleAttribute set, so it can be used from VBA without much difficulty.
I had to create a .tlb file to Reference. In the
%SYSTEMROOT%\Microsoft.NET\Framework\currentver\
directory, run the command
regasm /codebase system.windows.forms.dll
to create the system.windows.forms.tlb file. I already had this .tlb file on my system, but I had to recreate it using this command to be able to create a .Net System.Windows.Forms RichTextBox object successfully in VBA.
With the new .tlb file created, in VBA link it to your project via Tools->References in the VBA IDE.
I wrote this test code in Access to demonstrate the solution.
Dim rtfSample As String
rtfSample = "{\rtf1\ansi\deflang1033\ftnbj\uc1 {\fonttbl{\f0 \froman \fcharset0 Times New Roman;}{\f1 \fswiss \fcharset0 Segoe UI;}} {\colortbl ;\red255\green255\blue255 ;} {\stylesheet{\fs22\cf0\cb1 Normal;}{\cs1\cf0\cb1 Default Paragraph Font;}} \paperw12240\paperh15840\margl1440\margr1440\margt1440\margb1440\headery720\footery720\deftab720\formshade\aendnotes\aftnnrlc\pgbrdrhead\pgbrdrfoot \sectd\pgwsxn12240\pghsxn15840\marglsxn1440\margrsxn1440\margtsxn1440\margbsxn1440\headery720\footery720\sbkpage\pgnstarts1\pgncont\pgndec \plain\plain\f1\fs22\lang1033\f1 hello question stem\plain\f1\fs22\par}"
Dim miracle As System_Windows_Forms.RichTextBox
Set miracle = New System_Windows_Forms.RichTextBox
With miracle
.RTF = rtfSample
RTFExtractPlainText = .TEXT
End With
MsgBox RTFExtractPlainText(rtfSample)
With the result
I'd assume re-creating the .tlb file in the \Framework64\ directory would be needed on 64-bit Windows with 64-bit Office. I am running 64-bit Win10 with 32-bit Office 2013, so I had to have a 32-bit .tlb file.
Upvotes: 4
Reputation: 22866
Another alternative can be using Microsoft Rich Textbox Control (but can't test it on x64 Office)
Sub rtfToText()
With CreateObject("RICHTEXT.RichtextCtrl") ' or add reference to Microsoft Rich Textbox Control for early binding and With New RichTextLib.RichTextBox
.SelStart = 0 ' needs to be selected
.TextRTF = Join(Application.Transpose(Cells.CurrentRegion.Columns(1)))
[C1] = .Text ' set the destination cell here
' or if you want them in separate cells:
a = Split(.Text, vbNewLine)
Range("C3").Resize(UBound(a) + 1) = Application.Transpose(a)
End With
End Sub
Upvotes: 1