Isyar Harun
Isyar Harun

Reputation: 35

what regex must i use to split this?

i am very newbie to c#..

i want program if input like this

input : There are 4 numbers in this string 40, 30, and 10

output :

there = string
are = string
4 = number
numbers = string
in = string
this = string
40 = number
, = symbol
30 = number
, = symbol
and = string
10 = number

i am try this

{
    class Program
    {
        static void Main(string[] args)
        {
            string input = "There are 4 numbers in this string 40, 30, and 10.";
            // Split on one or more non-digit characters.
            string[] numbers = Regex.Split(input, @"(\D+)(\s+)");
            foreach (string value in numbers)
            {
                Console.WriteLine(value);               
            }
        }
    }
}

but the output is different from what i want.. please help me.. i am stuck :((

Upvotes: 2

Views: 298

Answers (5)

Ahmad Mageed
Ahmad Mageed

Reputation: 96477

You could split using this pattern: @"(,)\s?|\s"

This splits on a comma, but preserves it since it is within a group. The \s? serves to match an optional space but excludes it from the result. Without it, the split would include the space that occurred after a comma. Next, there's an alternation to split on whitespace in general.

To categorize the values, we can take the first character of the string and check for the type using the static Char methods.

string input = "There are 4 numbers in this string 40, 30, and 10";
var query = Regex.Split(input, @"(,)\s?|\s")
                 .Select(s => new
                 {
                     Value = s,
                     Type = Char.IsLetter(s[0]) ?
                             "String" : Char.IsDigit(s[0]) ?
                             "Number" : "Symbol"
                 });
foreach (var item in query)
{
    Console.WriteLine("{0} : {1}", item.Value, item.Type);
}

To use the Regex.Matches method instead, this pattern can be used: @"\w+|,"

var query = Regex.Matches(input, @"\w+|,").Cast<Match>()
                 .Select(m => new
                 {
                     Value = m.Value,
                     Type = Char.IsLetter(m.Value[0]) ?
                             "String" : Char.IsDigit(m.Value[0]) ?
                             "Number" : "Symbol"
                 });

Upvotes: 1

ΩmegaMan
ΩmegaMan

Reputation: 31596

The regex parser has an if conditional and the ability to group items into named capture groups; to which I will demonstrate.

Here is an example where the patttern looks for symbols first (only a comma add more symbols to the set [,]) then numbers and drops the rest into words.

string text = @"There are 4 numbers in this string 40, 30, and 10";
string pattern = @"
(?([,])            # If a comma (or other then add it) is found its a symbol
  (?<Symbol>[,])   # Then match the symbol
 |                 # else its not a symbol
  (?(\d+)             # If a number
    (?<Number>\d+)    # Then match the numbers
   |                  # else its not a number
    (?<Word>[^\s]+)   # So it must be a word.
   ) 
)
";


// Ignore pattern white space allows us to comment the pattern only, does not affect
// the processing of the text!
Regex.Matches(text, pattern, RegexOptions.IgnorePatternWhitespace)
     .OfType<Match>()
     .Select (mt => 
    {
        if (mt.Groups["Symbol"].Success)
            return  "Symbol found:     " + mt.Groups["Symbol"].Value;

        if (mt.Groups["Number"].Success) 
            return  "Number found:  " + mt.Groups["Number"].Value;

        return "Word found:     " + mt.Groups["Word"].Value;
    }
     )
     .ToList() // To show the result only remove
     .ForEach(rs => Console.WriteLine (rs));

/* Result
Word found:     There
Word found:     are
Number found:  4
Word found:     numbers
Word found:     in
Word found:     this
Word found:     string
Number found:  40
Symbol found:     ,
Number found:  30
Symbol found:     ,
Word found:     and
Number found:  10
*/

Once the regex has tokenized the resulting matches, then we us linq to extract out those tokens by identifying which named capture group has a success. In this example we get the successful capture group and project it into a string to print out for viewing.

I discuss the regex if conditional on my blog Regular Expressions and the If Conditional for more information.

Upvotes: 2

Adrian Iftode
Adrian Iftode

Reputation: 15663

If you want to get the numbers

var reg = new Regex(@"\d+");
var matches = reg.Matches(input );
var numbers = matches
        .Cast<Match>()
        .Select(m=>Int32.Parse(m.Groups[0].Value));

To get your output:

var regSymbols = new Regex(@"(?<number>\d+)|(?<string>\w+)|(?<symbol>(,))");
var sMatches = regSymbols.Matches(input );
var symbols = sMatches
    .Cast<Match>()
    .Select(m=> new
    {                  
       Number = m.Groups["number"].Value,
       String = m.Groups["string"].Value,
       Symbol = m.Groups["symbol"].Value
     })
    .Select(
      m => new 
      {
        Match = !String.IsNullOrEmpty(m.Number) ? 
                    m.Number : !String.IsNullOrEmpty(m.String) 
                            ? m.String : m.Symbol,
        MatchType = !String.IsNullOrEmpty(m.Number) ? 
                    "Number" : !String.IsNullOrEmpty(m.String) 
                            ? "String" : "Symbol"
      }
    );

edit If there are more symbols than a comma you can group them in a class, like @Bogdan Emil Mariesan did and the regex will be:

@"(?<number>\d+)|(?<string>\w+)|(?<symbol>[,.\?!])"

edit2 To get the strings with =

var outputLines = symbols.Select(m=>
                            String.Format("{0} = {1}", m.Match, m.MatchType));

Upvotes: 0

Woot4Moo
Woot4Moo

Reputation: 24316

You can very easily do this like so:

string[] tokens = Regex.Split(input, " ");  

foreach(string token in tokens)  
{  
    if(token.Length > 1)  
    {   
       if(Int32.TryParse(token))  
       {  
          Console.WriteLine(token + " =   number");
       }
      else  
      {  
         Console.WriteLine(token + " = string");  
      }  
    }    
    else  
    {
      if(!Char.isLetter(token ) && !Char.isDigit(token))   
      {  
        Console.WriteLine(token + " = symbol");
      }  
  }
}  

I do not have an IDE handy to test that this compiles. Essentially waht you are doing is splitting the input on space and then performing some comparisons to determine if it is a symbol, string, or number.

Upvotes: 0

Bogdan Emil Mariesan
Bogdan Emil Mariesan

Reputation: 5647

Well to match all numbers you could do:

[\d]+

For the strings:

[a-zA-Z]+

And for some of the symbols for example

 [,.?\[\]\\\/;:!\*]+

Upvotes: 0

Related Questions