marcusstarnes
marcusstarnes

Reputation: 6531

C# Regular Expression to manipulate text

I have a database with a lot of old phpBB data that contains posts with text like:

[b:522f1e2c15]bold[/b:522f1e2c15]
[i:522f1e2c15]italic[/i:522f1e2c15]
[u:522f1e2c15]underline[/u:522f1e2c15]
[img:522f1e2c15]http://www.mysite.com/myimage.jpg[/img:522f1e2c15]
[quote:522f1e2c15="Mr Smith"]quoted text by Mr Smith[/quote:522f1e2c15]
[quote="Mr Smith"]quoted text by Mr Smith[/quote]

I am migrating this data to a new system, and these tags all need to be manipulated when I come to render them, so they become:

<b>bold</b>
<i>italic</i>
<u>underline</u>
<img src="http://www.mysite.com/myimage.jpg" />
<div><h4>Posted by Mr Smith</h4>quoted text by Mr Smith
<div><h4>Posted by Mr Smith</h4>quoted text by Mr Smith

In most cases, the 'id' that appears within the original phpBB tags are the same 'per post', so a post might look like the following (with all tags containing the id '522f1e2c15'):

This is [b:522f1e2c15]bold[/b:522f1e2c15] and this is [i:522f1e2c15]italic[/i:522f1e2c15].

However, I do also need to cater for that id being different from one tag to the next, within the same post, i.e.

This is [b:123f1e2c15]bold[/b:123f1e2c15] and this is [i:522f1e2c15]italic[/i:522f1e2c15].

I also need to be able to handle nested instances of these tags, for example, bold tagged text with some italic tagged text inside, i.e.

This is [b:522f1e2c15]bold and [i:522f1e2c15]this is bold italic[/i:522f1e2c15][/b:522f1e2c15].

I originally posted a similar question on here specifically to handle the 'quote' instance of the above, which was answered with what appeared to be a working solution, but with further testing, I've noticed that if the id contained within a tag is used in another tag in the same post (i.e. the example I posted above), then it breaks.

So essentially I need a regular expression solution that handles all of the above.

Upvotes: 0

Views: 222

Answers (1)

Lucero
Lucero

Reputation: 60190

The nested tags bit is the difficult part to do in Regex, but the .NET regex engine provides all the tools required to handle them and the ID matching. You can apply a regex with balanced groups to solve the task.

That said, for this kind of data I'd rather implement a true parser, for instance by using a toolset such as the GOLD Parser System or ANTLR.

Upvotes: 1

Related Questions