Yochai Timmer
Yochai Timmer

Reputation: 49271

C# extract words using regex

I've found a lot of examples of how to check something using regex, or how to split text using regular expressions.

But how can I extract words out of a string ?

Example:

aaaa 12312 <asdad> 12334 </asdad>

Lets say I have something like this, and I want to extract all the numbers [0-9]* and put them in a list.

Or if I have 2 different kind of elements:

aaaa 1234 ...... 1234 ::::: asgsgd

And I want to choose digits that come after ..... and words that come after ::::::

Can I extract these strings in a single regex ?

Upvotes: 3

Views: 10444

Answers (6)

mroach
mroach

Reputation: 2478

Something like this will do nicely!

var text = "aaaa 12312 <asdad> 12334 </asdad>";
var matches = Regex.Matches(text, @"\w+");

var arrayOfMatched = matches.Cast<Match>().Select(m => m.Value).ToArray();

Console.WriteLine(string.Join(", ", arrayOfMatched));

\w+ Matches consecutive word characters. Then we just selected the values out of the list of matches and turn them into an array.

Upvotes: 2

rlb.usa
rlb.usa

Reputation: 15041

    Regex phoneregex = new Regex("[0-9][0-9][0-9]\-[0-9][0-9][0-9][0-9]");
    String unicornCanneryDirectory = "unicorn cannery 483-8627 cha..."
    String numbersToCall = "";

    //the second argument is where to begin within the match, 
    //we probably want 0, the first character
    Match matchIterator = phoneregex.Match(unicornCanneryDirectory , 0);
    //Success tells us if matchIterator has another match or not
    while( matchIterator.Sucess){
      String aResult = matchIterator.Result();
      //we could manipulate our match now but I'm going to concatenate them all for later
      numbersToCall  += aResult + " ";

      matchIterator = matchIterator.NextMatch();
    }

    // use my concatenated matches now
    String message = "Unicorn rights activists demand more sparkles in the unicorn canneries under the new law...";
    phoneDialer.MassCallWithAutomatedMessage(aResult, message );

http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.match.nextmatch.aspx

Upvotes: 1

Chuck Savage
Chuck Savage

Reputation: 11955

Regex itemsRegex = new Regex(@"(\d*)");
MatchCollection matches = itemsRegex.Matches(text);

int[] values = matches.Cast<Match>().Select(m => Convert.ToInt32(m.Value)).ToArray();

Upvotes: 1

Timwi
Timwi

Reputation: 66604

In the general case, you can do this using capturing parentheses:

string input = "aaaa 1234 ...... 1234 ::::: asgsgd";
string regex = @"\.\.\.\. (\d+) ::::: (\w+)";
Match m = Regex.Match(input, regex);

if (m.Success) {
    int numberAfterDots = int.Parse(m.Groups[1].Value);
    string wordAfterColons = m.Groups[2].Value;
    // ... Do something with these values
}

But the first part you asked (extract all the numbers) is a bit easier:

string input = "aaaa 1234 ...... 1234 ::::: asgsgd";
var numbers = Regex.Matches(input, @"\d+")
                   .Cast<Match>()
                   .Select(m => int.Parse(m.Value))
                   .ToList();

Now numbers will be a list of integers.

Upvotes: 3

Chris Benard
Chris Benard

Reputation: 3215

For your specific examples:

    string firstString = "aaaa 12312 <asdad> 12334 </asdad>";
    Regex firstRegex = new Regex(@"(?<Digits>[\d]+)", RegexOptions.ExplicitCapture);
    if (firstRegex.IsMatch(firstString))
    {
        MatchCollection firstMatches = firstRegex.Matches(firstString);
        foreach (Match match in firstMatches)
        {
            Console.WriteLine("Digits: " + match.Groups["Digits"].Value);
        }
    }

    string secondString = "aaaa 1234 ...... 1234 ::::: asgsgd";
    Regex secondRegex = new Regex(@"([\.]+\s(?<Digits>[\d]+))|([\:]+\s(?<Words>[a-zA-Z]+))", RegexOptions.ExplicitCapture);
    if (secondRegex.IsMatch(secondString))
    {
        MatchCollection secondMatches = secondRegex.Matches(secondString);
        foreach (Match match in secondMatches)
        {
            if (match.Groups["Digits"].Success)
            {
                Console.WriteLine("Digits: " + match.Groups["Digits"].Value);
            }
            if (match.Groups["Words"].Success)
            {
                Console.WriteLine("Words: " + match.Groups["Words"].Value);
            }
        }
    }

Hope that helps. The output is:

Digits: 12312
Digits: 12334
Digits: 1234
Words: asgsgd

Upvotes: 2

JoshBerke
JoshBerke

Reputation: 67128

Here's a solution for your first problem:

   class Program
    {
        static void Main(string[] args)
        {
            string data = "aaaa 12312 <asdad> 12334 </asdad>";

            Regex reg = new Regex("[0-9]+");

            foreach (var match in reg.Matches(data))
            {
                Console.WriteLine(match);
            }

            Console.ReadLine();
        }
    }

Upvotes: 4

Related Questions