Between double curly braces: replace particular text

Question

I've got a string (Python 2.7.3) which is rendered as a template in Django but I don't think this is specific to Django. The string comes from the document.xml file inside a docx file. I'm extacting the document xml rendering it and putting it back inside the docx for some simple mail merge type stuff.

One of the issues, other than the obvious limitations to what template tags I can use, is that Word likes to drop in a whole bunch of xml if you edit the text in Word.

For my needs, I'd be successful if I could

find all occurrences of " between double curly braces and replace with a quote ".

I'd like to replace the " with " in something like the following:

word_docxml = 'some text here {{form.letterdate|date:"Y-m-d"}} and more text'

I was reading over these:

but having trouble putting it together.

How do I remove/strip everything inside and including the < > in between {{ }}'s in a mess like the following:


  
  
  
  
  

{{form.undefinedundefined

  
  
  
  
  
  
  

Lundefinedundefined

  
  
  
  
  

etterDate.value|date:"Y-m-d"}}undefined

which would result in the following (apologies, I can't seem to highlight the area of interest):


  
  
  
  
  

{{form.LetterDate.value|date:"Y-m-d"}}undefined

How does one handle this? Is regex the way to go; if so, how to put the command together?

This is not a duplicate of Between double curly braces: replace particular text because it has no mention of handling a double curly brace for start and end for the search range (that was my real problem, I've read through many examples and was unable to get the pattern for substitution formatted correctly). The other post is about parsing a subset of html entities in XHTML; there is no XHTML parsing required, mentioned or questioned in my post. This post here asks how to remove and/or replace a repeating pattern between two other known start/end patterns. I provided a brief background, two concrete examples from the simple to the complex hoping to learn how to accomplish my current task - my best hope was to get part A explained and apply the method myself to part B. I got intelligent discussion and super replies from helpful members of the community. My post doesn't involve HTML at all as the template I'm rendering in Django is added back to a docx archive and saved to a filestore. It is not a duplicate (of the marked duplicate anyhow).

melwil · Accepted Answer

Yes, regex is great for this!

a) Use this:

 re.sub(r"(\{\{[^}]+}\})", lambda m: re.sub(""", '"', m.group(1)), word_docxml)

Results:

>>> word_docxml = 'some text here {{form.letterdate|date:"Y-m-d"}} and " more text'
>>> re.sub(r"(\{\{[^}]+}\})", lambda m: re.sub(""", '"', m.group(1)), word_docxml)
'some text here {{form.letterdate|date:"Y-m-d"}} and " more text'

b) More of the same, just matching different content inside the braces;

re.sub(r"(\{\{[^}]+}\})", lambda m: re.sub("<[^>]+>", "", m.group(1)), s)

Results:

>>> s = """{{form.LetterDate.value|date:"Y-m-d"}}"""
>>> re.sub(r"(\{\{[^}]+}\})", lambda m: re.sub("<[^>]+>", "", m.group(1)), s)
'{{form.LetterDate.value|date:"Y-m-d"}}'

Explanation, since you asked for guidance, not just the answer;

re.sub(r"(\{\{[^}]+}\})", lambda m: re.sub(""", '"', m.group(1)), word_docxml)

The way this works is to first match a double brace interval. The lambda expression just takes the group found in that match and does the replace of the relevant content.

The smaller regexes explained:

"     # Just matching that, nothing fancy

A pattern to match tags;

<     # Opening of tag
[^>]+ # Followed by 1 or more characters that are not closing tags
>     # Followed by a closing tag

Between double curly braces: replace particular text

Answers (2)

Related Questions