TotoKalvera
TotoKalvera

Reputation: 37

Creating a delimited text using regex

I have a glossary where the entries are given in bold text, and the explanatory text is regular font. What I'd like to do is to add a comma (asterisk or any other punctuation sign) after every string of bolded text, which would then enable me to turn the glossary into a comma delimited text in Excel. Is there a way to do this using regular expressions in Word's find and replace dialog box, so I can get a comma at the end of the word or phrase which is an entry in the glossary. Here's one entry from the glossary as an example:

To abolish at entry into force ....content of the entry in regular font..

The entry is the bolded phrase, and the explanation associated with it is given in regular text.

After trying the expression <[A-Za-z\,.-)(/\?! ]{1,}> Jerry suggested in the Find box and ^&, in the Replace box in MS Word, I get the desired result for the phrases in bold which are followed by a paragraph break, such as titles and headings:


http://img811.imageshack.us/img811/6338/frontpagep.jpg


but no changes happen on the glossary entries, because they are followed by the content of the entry, with no paragraph break after them. Here's a sample from the glossary, showing the characteristic layout of the glossary entries:


http://img571.imageshack.us/img571/6558/samplefromtheglossary.jpg


Upvotes: 1

Views: 601

Answers (3)

Jerry
Jerry

Reputation: 71578

Could you try this one:

<[A-Za-z\,\.\-\)\(\/\?\! ]{1,}>

Instead of the find of <*> George suggested previously?

Following new edit of OP:

You could try putting this in a macro:

Sub CommaAdder()
    Selection.Find.ClearFormatting
    Selection.Find.Font.Bold = True
    Selection.Find.Replacement.ClearFormatting
    With Selection.Find
        .Text = "<[A-Za-z]@>"
        .Replacement.Text = "^&,"
        .Forward = True
        .Wrap = wdFindContinue
        .Format = True
        .MatchCase = False
        .MatchWholeWord = False
        .MatchByte = False
        .MatchAllWordForms = False
        .MatchSoundsLike = False
        .MatchFuzzy = False
        .MatchWildcards = True
    Selection.Find.Execute Replace:=wdReplaceAll
    End With
    With Selection.Find
        .Text = ",([\)])"
        .Replacement.Text = "\1,"
        .Forward = True
        .Wrap = wdFindContinue
        .Format = True
        .MatchCase = False
        .MatchWholeWord = False
        .MatchByte = False
        .MatchAllWordForms = False
        .MatchSoundsLike = False
        .MatchFuzzy = False
        .MatchWildcards = True
    Selection.Find.Execute Replace:=wdReplaceAll
    End With
    With Selection.Find
        .Text = ",([\-\?\/\!\.\, ])"
        .Replacement.Text = "\1"
        .Forward = True
        .Wrap = wdFindContinue
        .Format = True
        .MatchCase = False
        .MatchWholeWord = False
        .MatchByte = False
        .MatchAllWordForms = False
        .MatchSoundsLike = False
        .MatchFuzzy = False
        .MatchWildcards = True
    Selection.Find.Execute Replace:=wdReplaceAll
    End With
    With Selection.Find
        .Text = "([A-Za-z]@ )"
        .Replacement.Text = "\1,"
        .Forward = True
        .Wrap = wdFindContinue
        .Format = True
        .MatchCase = False
        .MatchWholeWord = False
        .MatchByte = False
        .MatchAllWordForms = False
        .MatchSoundsLike = False
        .MatchFuzzy = False
        .MatchWildcards = True
    Selection.Find.Execute Replace:=wdReplaceAll
    End With
    With Selection.Find
        .Text = "( \,)([A-Za-z]@)"
        .Replacement.Text = ", \2"
        .Forward = True
        .Wrap = wdFindContinue
        .Format = True
        .MatchCase = False
        .MatchWholeWord = False
        .MatchByte = False
        .MatchAllWordForms = False
        .MatchSoundsLike = False
        .MatchFuzzy = False
        .MatchWildcards = True
    Selection.Find.Execute Replace:=wdReplaceAll
    End With
    With Selection.Find
        .Text = "\,\("
        .Replacement.Text = "("
        .Forward = True
        .Wrap = wdFindContinue
        .Format = True
        .MatchCase = False
        .MatchWholeWord = False
        .MatchByte = False
        .MatchAllWordForms = False
        .MatchSoundsLike = False
        .MatchFuzzy = False
        .MatchWildcards = True
    Selection.Find.Execute Replace:=wdReplaceAll
    End With
    With Selection.Find
        .Text = "\, "
        .Replacement.Text = " "
        .Forward = True
        .Wrap = wdFindContinue
        .Format = True
        .MatchCase = False
        .MatchWholeWord = False
        .MatchByte = False
        .MatchAllWordForms = False
        .MatchSoundsLike = False
        .MatchFuzzy = False
        .MatchWildcards = True
    Selection.Find.Execute Replace:=wdReplaceAll
    End With
    With Selection.Find
        .Text = " \,"
        .Replacement.Text = ", "
        .Forward = True
        .Wrap = wdFindContinue
        .Format = True
        .MatchCase = False
        .MatchWholeWord = False
        .MatchByte = False
        .MatchAllWordForms = False
        .MatchSoundsLike = False
        .MatchFuzzy = False
        .MatchWildcards = True
    Selection.Find.Execute Replace:=wdReplaceAll
    End With
End Sub

I don't know how to write macros, but I recorded one replace to build this up and if you understand that a bit, there are 3 replaces going on one after the other. The first looks for all bold words and puts a comma in between them, even if there are ) or ., etc. The second looks specifically for the instances I just mentioned to remove that comma, and the ones where we have a comma followed by a bold white space ,, except the ,) part, which is addressed in the last replace, where it replaces that with ), instead.

The issue is that if you have something like:

This is bold but not this

The white space between bold and but also having the bold format, it will be removed at the second replace. If there was one way to find text with part bold and part not bold, there wouldn't be any issue. I'm trying to look for a solution to that, but let me know if there are any issues with this code. If there is no white space formatted as bold like that, there won't be any issue!

reEDIT: This works for bold spaces too now! Though it's not too neat...

Upvotes: 0

Mikeb
Mikeb

Reputation: 6361

Some simple searching around shows the following documents that apply to older versions of Word- up to 2002 at least.

office support document

Note that Word's implementation of Regex is not very conformant to other people's implementations.

I didn't see a doc for a more recent version of Word. The Search and Replace in Visual Studio (I think this works in the free version too), or in an IDE like Eclipse, does support Regex, so you have lots of non-Word options that might work as well.

Upvotes: 0

George
George

Reputation: 357

Use wildcards, set format font to bold. For search, enter <*>. For replace, enter ^&,. Picture is worth a thousand scripts

Upvotes: 0

Related Questions