user7618628
user7618628

Reputation:

C#: Remove Excess Text From String

Okay, so after looking around here on SO, I have found a solution that meets about 95% of my requirement, although I believe it may need to be redone at this point.

ISSUE

Say I have a value range supplied as "1000 - 1009 ABC1 ABC SOMETHING ELSE" where I just need the 1000 - 1009 part. I need to be able to remove excess characters from the string supplied, even if they truly are accepted characters, but only if they are part of secondary strings with text. (Sorry if that description seems odd, my mind isn't full power today.)

CURRENT SOLUTION

I currently have a simple method utilizing Linq to return only accepted characters, however this will return "1000 - 10091" which is not the range I am needing. I've thought about looping through the strings individual characters and comparing to previous characters as I go using IsDigit and IsLetter to my advantage, but then comes the issue of replacing the unacceptable characters or removing them. I think if I gave it a day or two I could figure it out with a clear mind, but it needs to be done by the end of the day, and I am banging my head against the keyboard.

void RemoveExcessText(ref string val) {
    string allowedChars = "0123456789-+>";
    val = new string(val.Where(c => allowedChars.Contains(c)).ToArray());
}


// Alternatively?
char previousChar = ' ';
for (int i = 0; i < val.Length; i++) {
    if (char.IsLetter(val[i])) {
        previousChar = val[i];
        val.Remove(i, 1);
    } else if (char.IsDigit(val[i])) {
        if (char.IsLetter(previousChar)) {
            val.Remove(i, 1);
        }
    }
}

But how do I calculate white space and leave in the +, -, and > charactrers? I am losing my mind on this one today.

Upvotes: 2

Views: 163

Answers (3)

Jcl
Jcl

Reputation: 28272

Why not use a regular expression?

Regex.Match("1000 - 1009 ABC1 ABC SOMETHING ELSE", @"^(\d+)([\s\-]+)(\d+)");

Should give you what you want

I made a fiddle

Upvotes: 4

freshop
freshop

Reputation: 245

You could match this with a regular expression. \d{1,4} means match a decimal digit at least once up to 4 times. Followed by space, hyphen, space, and 1 to 4 digits again, then anything else. Only the part inside parenthesis is output in your results.

using System;
using System.Text.RegularExpressions;

public class Program
{
    public static void Main()
    {
        var pattern = @"(^\d{1,4} - \d{1,4}).*";
        string input = ("1000 - 1009 ABC1 ABC SOMETHING ELSE");
        string replacement = "$1";
        string result = Regex.Replace(input, pattern, replacement);
        Console.WriteLine(result);
    }
}

https://dotnetfiddle.net/cZGlX4

Upvotes: 0

Caius Jard
Caius Jard

Reputation: 74660

You use a regular expression with a capturing group:

Regex r = new Regex("^(?<v>[-0-9 ]+?)");

This means "from the start of the input string (^) match [0 to 9 or space or hyphen] and keep going for as many occurrences of these characters as are available (+?) and store it into variable v (?)"

We get it out like this:

r.Matches(input)[0].Groups["v"].Value

Note though that if the input string doesn't match, the match collection will be 0 long and a call to [0] will crash. To this end you might want to robust it up with some extra error checking:

MatchCollection mc = r.Matches(input);
if(mc.Length > 0)
  MessageBox.Show(mc[0].Groups["v"].Value;

Upvotes: 1

Related Questions