chandru
chandru

Reputation: 1

word/VBA: Unable to retrieve formatted text from table

I am using below code to generate XML from tables of word file.

Sub exprt() Dim nofT, cnt, tag, btxt As Integer
Dim Rcnt As Long
Dim tit, pTyp, cValu As String
Dim dum As Range
Set a = ActiveDocument.Tables
nofT = ActiveDocument.Tables.Count
Set docold = ActiveDocument

For cnt = 5 To nofT
Selection.TypeText ("")
Selection.TypeParagraph
Selection.TypeText ("")
Selection.TypeParagraph
Set dum = ActiveDocument.Tables(cnt).Cell(2, 1).Range
dum.End = dum.End - 1
tit = dum.Text
tit = LTrim(Right(tit, (Len(tit) - InStr(tit, ":"))))
Selection.TypeText (Chr(9) + "<" + Chr(33) + "[CDATA[" + tit + "]]><" + Chr(47) + "title_text_1>")
Selection.TypeParagraph
Selection.TypeParagraph
Rcnt = a(cnt).Rows.Count
Set dum = ActiveDocument.Tables(cnt).Cell(3, 1).Range
dum.End = dum.End - 1
pTyp = dum.Text
pTyp = LTrim(Right(pTyp, (Len(pTyp) - InStr(pTyp, ":"))))
tag = 1
For btxt = 6 To Rcnt - 1
Set dum = ActiveDocument.Tables(cnt).Cell(btxt, 1).Range.FormattedText
dum.End = dum.End - 1
Selection.TypeText (Chr(9) + "<" + Chr(33) + "[CDATA[" + dum + "]]><" + Chr(47) + "body_text_" + LTrim(Str(tag)) + ">")
Selection.TypeParagraph
Selection.TypeParagraph
tag = tag + 1
Next
Set dum = docold.Tables(cnt).Cell(Rcnt, 1).Range
dum.End = dum.End - 1
Selection.TypeText (Chr(9) + "<" + Chr(33) + "[CDATA[" + dum + "]]><" + Chr(47) + "prompt_text_1>")
Selection.TypeParagraph
Selection.TypeParagraph
Selection.TypeText ("<" + Chr(47) + "data>")
Selection.TypeParagraph
Selection.Extend
Selection.HomeKey Unit:=wdStory
newTxt = ActiveDocument.ActiveWindow.Selection
Selection.Cut
Set dum = docold.Tables(cnt).Cell(1, 1).Range
dum.End = dum.End - 1
Call pub(dum.Text)
Next
End Sub


Sub pub(nme As String)
Dim FolderPath As String
Dim FileName As String
FolderPath = "c:/Chandru/"
FileName = nme & ".xml"
Documents.Add
Selection.Paste
Call bld
ActiveDocument.SaveAs FileName:=FolderPath & FileName, FileFormat:=wdFormatText
ActiveDocument.Close
End Sub

Problem: I want to add "< b> & < /b >" for bold text, similarly for italic text. When i retrieve the table cell content, i am not getting formatted text. How do i can add < b> & < i> for bold & italic text respectively??

Upvotes: 0

Views: 1602

Answers (2)

Andreas J
Andreas J

Reputation: 546

In general I am not too sure if it is a really good idea to generate an XML file in that way, especially adding formats to your data. But that was not your question.

To extract the formatted text you rely on the

FormattedText

from a Range-object. Now you can use the resultinc Range-object to e.g. paste the formatted text somewhere else in your document. But it does not directly 'give' you the formattings to use in your code.

What you would have to do is to either parse this FormattedText-range object character (with formatting) for character or use another method (see below).

Example, output every character in the current selection with some formattings

Dim myRange As Range
Dim myChar As Variant

Set myRange = Selection.FormattedText

For Each myChar In myRange.Characters
    Debug.Print myChar.Text, myChar.Bold, myChar.Italic, myChar.Underline
Next

You could create a function parsing your range into a string that includes the formattings in HTML-format (i.e. bold as ... and so on).

Doable bit tricky, because you need to make sure you that you don't create something that is not valid like

<b>bold <i>bold italic</b> just italic </i> (no good!)

(Word or most browsers might not care, but it is certanly no longer valid XML).

You might consider some other way to extract your formated table-content. Since from Word 2003 onwards you can save your documents in XML-format you could try to extract your data there. Word does keep track of open formatting-tags, but as always with Word, you hve a lot garbage.

... [extract from an XML-Export; use e.g. Notepad++ to get a pretty-print version]
  <w:body>
    <wx:sect>
      <w:tbl>
        <w:tblPr>
          <w:tblStyle w:val="Tabellengitternetz"/>
          <w:tblW w:w="0" w:type="auto"/>
          <w:tblLook w:val="01E0"/>
        </w:tblPr>
        <w:tblGrid>
          <w:gridCol w:w="9286"/>
        </w:tblGrid>
        <w:tr wsp:rsidR="007A0EF3" wsp:rsidTr="007A0EF3">
          <w:tc>
            <w:tcPr>
              <w:tcW w:w="9286" w:type="dxa"/>
            </w:tcPr>
            <w:p wsp:rsidR="007A0EF3" wsp:rsidRDefault="007A0EF3" wsp:rsidP="007A0EF3">
              <w:r>
                <w:t>Titel </w:t>
              </w:r>
              <w:proofErr w:type="spellStart"/>
              <w:r wsp:rsidRPr="007A0EF3">
                <w:rPr>
                  <w:b/>
                </w:rPr>
                <w:t>Bold</w:t>
              </w:r>
              <w:proofErr w:type="spellEnd"/>
              <w:r>
                <w:t/>
              </w:r>
              <w:proofErr w:type="spellStart"/>
              <w:r wsp:rsidRPr="007A0EF3">
                <w:rPr>
                  <w:i/>
                </w:rPr>
                <w:t>Italic</w:t>
              </w:r>
              <w:proofErr w:type="spellEnd"/>
              <w:r>
                <w:t/>
              </w:r>
              <w:proofErr w:type="spellStart"/>
              <w:r wsp:rsidRPr="007A0EF3">
                <w:rPr>
                  <w:b/>
                  <w:i/>
                </w:rPr>
                <w:t>BoldItalic</w:t>
              </w:r>
              <w:proofErr w:type="spellEnd"/>
            </w:p>
          </w:tc>
        </w:tr>
...

Digging through the unimportant bits you find your text-body, your table(s), your text and corresponding formattings.

But as with most questions like this it all comes down to how often you need to run this task (how much effort you want to put into programming an automated solution), how much data is read and so on.

HTH Andreas

Upvotes: 3

Foole
Foole

Reputation: 4850

You will have to check the character formatting of each character in the Range individually.

Upvotes: 0

Related Questions