dnclem
dnclem

Reputation: 2838

regex to remove everything after a certain character (comment)

I have a regex which I am using to remove everything after a specific character, semi-colon.

        var regex = new Regex(@";(.*)", RegexOptions.Singleline);
        tb.Text = regex.Replace(tb.Text, "");

It seems to work fine, but at times it removes the entire text of the text box. For example all of this code is removed:

;fgkdfgdfgd
;dfgdfkghdfgdf
;sdgfsdfsdfsdf
;dfgdfgdfg

#dont remove this          ;fgdfgdfg

the "#dont remove this" should stay intact because it isn't after the semi-colon, but it doesn't? Is something wrong with my regex?

The idea is to remove or trim all comments from a file.

Upvotes: 1

Views: 4904

Answers (5)

Scott Rippey
Scott Rippey

Reputation: 15810

The issue is very simple - you misunderstood RegexOptions.SingleLine.

SingleLine tells the pattern that . can match line breaks. Read more about RegexOptions here.

Your current result is a single match (from the first ; to the end of the entire string).

You should just remove the RegexOptions.SingleLine and your pattern will match each comment to the end of the line.

Upvotes: 1

Yahia
Yahia

Reputation: 70369

try (UPDATE after comment):

tb.Lines = (
    from l in tb.Lines 
    let x = l.IndexOf (';') 
    select (x >= 0 ? l.SubString (0, x) : l)
).ToArray();

This should run faster than the Regex too...

Upvotes: 6

Marco
Marco

Reputation: 57573

You could do it easily using this:

tb.Text = tb.Text.Substring(0, tb.Text.IndexOf(';'));

This should run faster than using Regex...

If your textbox is multiline you could use:

s = TextBox1.Text;
string ret = "";
s.Split('\n').ToList().ForEach(p=>ret += p.Substring(0, p.IndexOf(';')) + "\n");
TextBox1.Text = ret;

Upvotes: 0

Alan Moore
Alan Moore

Reputation: 75222

RegexOptions.Singleline doesn't limit the match to a single line as you might expect. In fact, its purpose is just the opposite. It allows the . metacharacter to match newlines, making it easier to find matches that span across multiple lines. Just drop that and you should be fine.

Upvotes: 2

Lee Gunn
Lee Gunn

Reputation: 8656

It's because you are using RegexOptions.Singleline and therefore the . is matching new lines.

Upvotes: 2

Related Questions