MJA
MJA

Reputation: 522

Using a REGEX to replace words within a sub-match

I hope this isn't a repetition...

I need a regex to do what should be a fairly simple task. I have code for an HTML table, and I want to replace all <td> tags with <th> tags in the first row of the table, i.e. within the first set of <tr> </tr> tags. The table might look something like this:

<table cellpadding="5" cellspacing="0" border="1">
<tr>
<td>Capacity %</td>
<td>Tension V</td>
<td>Acid kg/l</td>
</tr>
<tr>
<td>100</td>
<td>12.70</td>
<td>1.265</td>
</tr>...etc

and I want:

<table cellpadding="5" cellspacing="0" border="1">
<tr>
<th>Capacity %</th>
<th>Tension V</th>
<th>Acid kg/l</th>
</tr>
<tr>
<td>100</td>
<td>12.70</td>
<td>1.265</td>
</tr>...etc

I've tried regexes similar to this:

/(<table>\n<tr>\n)(.+?)(</tr>)

...and then tried to rebuild the table row using back references, but I can't seem to apply the regex to the multiple </?td> matches that there might be.

I'm doing this in javascript, which means I can't use look-behinds (although if anyone has a look behind solution I'd be interested in seeing it anyway...).

Thanks in advance for any help.

Upvotes: 1

Views: 317

Answers (2)

Tim Pietzcker
Tim Pietzcker

Reputation: 336148

You could do it if your regex engine supports indefinite repetition inside lookbehind assertions, for example in .NET (C#):

resultString = Regex.Replace(subjectString, 
    @"(?<=      # Assert that we can match this before the current position:
     <table     # <table
     (?:        # followed by...
      (?!       # (unless there's an intervening
       </table  #  </table
      |         #  or
       </tr     #  </tr)
      )         # (End of lookahead assertion)
      .         # any character
     )*         # any number of times
    )           # (End of lookbehind assertion)
    <td         # Then match <td", 
    "<th", RegexOptions.Singleline | RegexOptions.IgnorePatternWhitespace);

works on your example. But even in .NET, I wouldn't use a regex for it, it's just too brittle. Better manipulate the DOM directly, that's what it's there for.

Upvotes: 1

Stephen Gross
Stephen Gross

Reputation: 5714

You can't do this with a single regex. Since regex basically works line-by-line, and you've got a special condition ("only on the first "), you'll need to write some conditional logic along with regex to make it work.

Upvotes: 0

Related Questions