axtscz
axtscz

Reputation: 691

Finding and replacing a pattern with bold and normal characters

So as the title suggests I have a crazy thing that I need to do and was wondering if there is a faster way to do it. Basically I have a list in Word format. On each line there is data that looks like this:

Bold Text Normal Text

I need to insert something between the bold and normal text. Is there any way to find only the places that match that pattern (i.e. B space here N)? I could then easily insert what I need. Maybe something with regex?

Upvotes: 0

Views: 1229

Answers (1)

arieljannai
arieljannai

Reputation: 2146

Ok, so a bit extreme idea:

The document you are talking about, is docx? if not, I guess you can convert it to it.

I've tried that on a docx file, without a regex, but i'm sure that you'll be able to take care of this :)

So!

  • Extract the docx file as a zip archive
    • You can add .zip to the file name, as an extension, or just open with an archiver - such as 7zip.
  • Navigate to the folder named word, under the extracted folder.
  • Open document.xml with your preferred editor
  • Every part of the text that changes his style - has a different tag
  • Find some string that looks like that: <w:r w:rsidDel="00000000" w:rsidR="00000000" w:rsidRPr="00000000"><w:rPr><w:b w:val="1"/><w:rtl w:val="0"/></w:rPr><w:t xml:space="preserve">bold text </w:t></w:r>
    • A string style section looks like that ^
    • The tag <w:b w:val="1"/> with the 1 value, indicates that this string inside ("bold text ") has the bold style.
  • Create a string that looks like what I've shown above, and insert the text you like. If for example you want the new text to have another style, like italic, so use <w:i w:val="1"/> (with i instead of b).

My example:
I wanted to add pictures, but I don't have enough reputation :(
It looks like:

  • Before: bold text normal text
  • After: bold text hi im new normal text

The XMLs example:
https://gist.github.com/arieljannai/08756ef562962eee0798

So, the only thing you need to do now, is build a regex that will find you the parts with w:b tags and all of the surrounding, and than you have it :)

Good luck!

EDIT: A regex example I made, that matches a style string line, like I put in the example above:
(<w:r.*?>(?:<w:b\s{1}.*?\/>){1}.*?(?:<w:t\s{1}.*?>(.*?)<\/w:t>)<\/w:r>)

  • The regex matches a section, between the <w:r> tag (first group).
  • The first non-matching group make sure it has the bold tag ((?:<w:b\s{1}.*?\/>))
  • The second non-matching group finds the tag that the text is with in it (the <w:t> tag).
  • inside the second non-matching group, there's the second matching group (.*?) which actually holds the text of that style string. (second group).

So you have the whole style string in the first group, and only the actual text in the second group.

Upvotes: 1

Related Questions