joghn
joghn

Reputation: 11

RegEx Replace help needed

Let's say I have a html string as shown below:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html dir='ltr' xmlns='http://www.w3.org/1999/xhtml' xmlns:b='http://www.google.com/2005/gml/b' xmlns:data='http://www.google.com/2005/gml/data' xmlns:expr='http://www.google.com/2005/gml/expr'>
<head>
</head>
<body>
<p>GRANDMÈRE Break the fillets of the saucepan on a double and shaped into neat pieces and stir it boil hard, or of nutmeg and salt. Throw them fry as a few inches by one in this very well. Put the whites of butter by three. Put some artichoke-bottoms cooked green</p>
<p>darkly colored on half with a little flour MY_IDENTIFIER and midrib. Put a hot for each side of vanilla cream as you cannot give it, and dish is a cauliflower, which you have not very useful sauce from the inside with a little nutmeg, and serve with the King of water</p>
<p>dining-room. At the meat. STUFFED CAULIFLOWER SOUP (BELGIAN RECIPE) Take three quarters of tying the juice of ham. Keep the pot, so as being interpreted, means that time put them into four bay salt, and chopped. When the better to sprinkle in the tomato much as many crescents one of</p>
<p>touch the rabbit to put quickly in. A white wine glass cups and pour over them, cut them in salt, pepper, and fill them in a pint of the liquor; it is a poached on slowly, without a layer of an egg on the yolks, and mix very clean, while</p>
<p>CAKE, EXCELLENT FOR PASTRY Equal quantities of red wine. Stew your taste, use that, with extract and salt and ham, mushrooms when the mold and dip them a good red wine. This dish with pepper and place meat and serve with a good foundation for twenty potatoes, and potato, some</p>
<p>half-an-hour. GOLDEN RICE Put them very little MY_IDENTIFIER book on a glass dish that way. CABBAGE WITH CHEESE Every one and season it up with not enough to make a pat of butter, each round quickly. Or add, instead of fresh lean and let it every now and put it melts</p>
<p>leek, and over it, a half a fireproof cases from burning. CHOU-CROUTE Take the salad you take out the amount of cream is not get in four, about three-and-a-half pints of the middle of this sauce some chopped almonds, chopped parsley and mix it in your pieces of grated cheese</p>
<p>sides. In four or flageolets, and stir in company with flour, and let it out, and pour over all, chop your vinegar to half a lemon--this would do not quite, add the edges. Steep them in a tablespoonful of butter and mustard. Take it in salted water; and, crumbling out</p>
<p>care that it in which you have seasoned with an equal size, mix MY_IDENTIFIER these are well with the fermentation has a custard. Put the top with a very carefully, so that you have added at a sieve; or, for at home than thick. Then fry the custard as you prepare</p>
<p>stuffing into a fireproof dish, and fry them to picnics, or marjoram with this MY_IDENTIFIER way besides parsley. Roll them out neatly with vanilla, a tablespoonful of mustard, pepper and salt, then pour it all cooked, and it to be ready to keep it simmer it over and salt. The original</p>
</body>
</html>

I need to find the p tags and if the text contains "MY_IDENTIFIER" then do some manipulations with that text and replace the text with some text.

Here I know how to find the paragraph tags with text using regex. I can loop the matches and can do manipulations with the text as required. I would like to know how to replace the matched item with another text.

In the above example, I have "MY_IDENTIFIER" on 2nd, 6th, 9th and 10th paragraphs. Let's say i would like to replace the 2nd paragraph text as

<p>2nd paragraph text</p>

and 6th paragraph text as

<p>6th paragraph text</p>

and so on...

The code I have so far ...

Imports System.Text.RegularExpressions

Module modMain

    Sub main()
        Dim fileContents As String
        fileContents = My.Computer.FileSystem.ReadAllText("C:\temp\a.html")
        Dim paras As MatchCollection = Regex.Matches(fileContents, "<p>(.+?MY_IDENTIFIER.+?)</p>")
        Dim TxtFound As String
        For Each oMatch As Match In paras
            TxtFound = oMatch.Groups(1).Value
            'do some manipulations with txtfound
            '...
            'replace the txtfound with some other text

        Next

        'Save the file again
    End Sub
End module

Any help appreciated.

Upvotes: 0

Views: 329

Answers (1)

user387049
user387049

Reputation: 6867

I would first attempt to find all paragraphs via a global match:

my @matches = ($string =~ m!<p>(.*?)</p>!sig);

Then I would loop through, and replace any that contain your identifier:

foreach(@matches) {
  #keep a copy for substitution below
  my $before = $_;

  #if the identifier is found, replace it
  if($_ =~ s!MY_IDENTIFIER!replacement text!is) {
    #then take the newly replaced text, and replace it in your original $string variable
    $string =~ s!$before!$_!is;
  }
}

Upvotes: 0

Related Questions