Sir Rubberduck
Sir Rubberduck

Reputation: 2262

Convert RTF (Rich Text Format) code into plain text in Excel

I'm exporting a database query as Excel and I am getting rows with RTF formatting.

Here is a screenshot of the Excel

How can I convert these fields into plain text? I've found answers that are pretty old, so I was wondering if anyone knows a way.

Upvotes: 5

Views: 16183

Answers (3)

Sir Rubberduck
Sir Rubberduck

Reputation: 2262

I'm revisiting this question to provide 2 javascript solutions, rather than a .NET one.

Approach 1

const parseRTF = require("rtf-parser");

let rtf = `{\\rtf1\\ansi\\deff0\\nouicompat{\\fonttbl{\\f0\\fnil\\fcharset0 Calibri;}{\\f1\\fnil\\fcharset204 Calibri;}{\\f2\\fnil Calibri;}}  {\\colortbl ;\\red0\\green0\\blue0;}  {\\*\\generator Riched20 10.0.19041}\\viewkind4\\uc1   \\pard\\cf1\\f0\\fs18\\lang1033 WEB1616 \\f1\\lang1071\\'ef\\'eb\\'e0\\'f2\\'e5\\'ed\\'ee \\'f1\\'ee \\'ea\\'e0\\'f0\\'f2\\'e8\\'f7\\'ea\\'e0\\par  \\'ca\\'f0\\'e8\\'f1\\'f2\\'e8\\'ed\\'e0 \\'c3\\'ee\\'eb\\'e0\\'e1\\'ee\\'f1\\'ea\\'e0 077640615\\par  \\'c2\\'e0\\'f0\\'f8\\'e0\\'e2\\'f1\\'ea\\'e0 6\\'e0\\par  1000 \\'d1\\'ea\\'ee\\'ef\\'bc\\'e5\\f2\\lang1033\\par  }  `;

function convertRTFtoPlainText(rtf) {
    return new Promise((resolve, reject) => {
        parseRTF.string(rtf, (err, doc) => {
            if (err) {
                reject(err);
            }

            let string = "";

            doc.content.forEach((item) => {
                if (item.content) {
                    item.content.forEach((span) => {
                        string += span.value;
                    });
                } else {
                    string += item.value;
                }
            });

            resolve(string.trim());
        });
    });
}

(async () => {
    let value = await convertRTFtoPlainText(rtf);

    console.log(value);
})();

Approach 2

const jsdom = require("jsdom");
const { JSDOM } = jsdom;

function stringToArrayBuffer(string) {
    if (string == null) return;
    let buffer = new ArrayBuffer(string.length);
    let bufferView = new Uint8Array(buffer);
    for (let i = 0; i < string.length; i++) {
        bufferView[i] = string.charCodeAt(i);
    }
    return buffer;
}

// callback = function to run after the DOM has rendered, defined when calling runRtfjs
function runRtfjs(rtf, callback, errorCallback) {
    const virtualConsole = new jsdom.VirtualConsole();
    virtualConsole.sendTo(console);

    let dom = new JSDOM(
        `
            <script src="./node_modules/rtf.js/dist/RTFJS.bundle.js"></script>

            <script>

                RTFJS.loggingEnabled(false);

                try {
                    const doc = new RTFJS.Document(rtfFile);

                    const meta = doc.metadata();
                    doc
                        .render()
                        .then(function(htmlElements) {

                            const div = document.createElement("div");
                            div.append(...htmlElements);

                            // window.done(meta, div.innerHTML);
                            // window.done(meta, div.innerText);
                            window.done(meta, div.textContent); // pass the data to the callback

                    }).catch(error => window.onerror(error))
                } catch (error){
                    window.onerror(error)
                }
            </script>
        `,
        {
            resources: "usable",
            runScripts: "dangerously",
            url: "file://" + __dirname + "/",
            virtualConsole,
            beforeParse(window) {
                window.rtfFile = stringToArrayBuffer(rtf);
                window.done = function (meta, html) {
                    callback(meta, html); // call the callback
                };
                window.onerror = function (error) {
                    errorCallback(error);
                };
            },
        }
    );
}

let rtf = `{\\rtf1\\ansi\\deff0\\nouicompat{\\fonttbl{\\f0\\fnil\\fcharset0 Calibri;}{\\f1\\fnil\\fcharset204 Calibri;}{\\f2\\fnil Calibri;}}  {\\colortbl ;\\red0\\green0\\blue0;}  {\\*\\generator Riched20 10.0.19041}\\viewkind4\\uc1   \\pard\\cf1\\f0\\fs18\\lang1033 WEB1616 \\f1\\lang1071\\'ef\\'eb\\'e0\\'f2\\'e5\\'ed\\'ee \\'f1\\'ee \\'ea\\'e0\\'f0\\'f2\\'e8\\'f7\\'ea\\'e0\\par  \\'ca\\'f0\\'e8\\'f1\\'f2\\'e8\\'ed\\'e0 \\'c3\\'ee\\'eb\\'e0\\'e1\\'ee\\'f1\\'ea\\'e0 077640615\\par  \\'c2\\'e0\\'f0\\'f8\\'e0\\'e2\\'f1\\'ea\\'e0 6\\'e0\\par  1000 \\'d1\\'ea\\'ee\\'ef\\'bc\\'e5\\f2\\lang1033\\par  }  `;

runRtfjs(
    rtf,
    (meta, html) => {
        console.log(html);
    },
    (error) => console.error(error)
);

Upvotes: 0

Robert Thompson
Robert Thompson

Reputation: 41

The .Net Framework RichTextBox class can perform the conversion. Fortunately, this class has the ComVisibleAttribute set, so it can be used from VBA without much difficulty.

I had to create a .tlb file to Reference. In the

%SYSTEMROOT%\Microsoft.NET\Framework\currentver\

directory, run the command

regasm /codebase system.windows.forms.dll

to create the system.windows.forms.tlb file. I already had this .tlb file on my system, but I had to recreate it using this command to be able to create a .Net System.Windows.Forms RichTextBox object successfully in VBA.

With the new .tlb file created, in VBA link it to your project via Tools->References in the VBA IDE.

I wrote this test code in Access to demonstrate the solution.

Dim rtfSample As String
rtfSample = "{\rtf1\ansi\deflang1033\ftnbj\uc1 {\fonttbl{\f0 \froman \fcharset0 Times New Roman;}{\f1 \fswiss \fcharset0 Segoe UI;}} {\colortbl ;\red255\green255\blue255 ;} {\stylesheet{\fs22\cf0\cb1 Normal;}{\cs1\cf0\cb1 Default Paragraph Font;}} \paperw12240\paperh15840\margl1440\margr1440\margt1440\margb1440\headery720\footery720\deftab720\formshade\aendnotes\aftnnrlc\pgbrdrhead\pgbrdrfoot \sectd\pgwsxn12240\pghsxn15840\marglsxn1440\margrsxn1440\margtsxn1440\margbsxn1440\headery720\footery720\sbkpage\pgnstarts1\pgncont\pgndec \plain\plain\f1\fs22\lang1033\f1 hello question stem\plain\f1\fs22\par}"

Dim miracle As System_Windows_Forms.RichTextBox
Set miracle = New System_Windows_Forms.RichTextBox
With miracle
    .RTF = rtfSample 
    RTFExtractPlainText = .TEXT
End With

MsgBox RTFExtractPlainText(rtfSample)

With the result

hello question stem

I'd assume re-creating the .tlb file in the \Framework64\ directory would be needed on 64-bit Windows with 64-bit Office. I am running 64-bit Win10 with 32-bit Office 2013, so I had to have a 32-bit .tlb file.

Upvotes: 4

Slai
Slai

Reputation: 22866

Another alternative can be using Microsoft Rich Textbox Control (but can't test it on x64 Office)

Sub rtfToText()
    With CreateObject("RICHTEXT.RichtextCtrl") ' or add reference to Microsoft Rich Textbox Control for early binding and With New RichTextLib.RichTextBox
        .SelStart = 0                          ' needs to be selected
        .TextRTF = Join(Application.Transpose(Cells.CurrentRegion.Columns(1)))
        [C1] = .Text                           ' set the destination cell here

        ' or if you want them in separate cells:
        a = Split(.Text, vbNewLine)
        Range("C3").Resize(UBound(a) + 1) = Application.Transpose(a)
    End With
End Sub

Upvotes: 1

Related Questions