Programming Newbie
Programming Newbie

Reputation: 1227

How to pull middle numbers from a string of numbers using regular expressions

I have a data field that contains large numbers in two formats:

553000.468...705.46.0000000        <- Format 1
553000.469.5501000.704.47.0000000  <- Format 2

I only need the three digits in the middle that include 703, 704, or 705

I was able to pull these digits by using regular expressions like so:

Regex num = new Regex(@"(?<number>7\d+) ?");
Match number = num.Match(numb);
if (number.Success)
    Console.WriteLine(num.Match(numb).Result("${number}")); 

However, that only works if there isn't a 7 preceding the middle numbers

It seems to me that the best way to approach it would be to focus on the ".". The problem is I can't figure out how to match character AFTER the "." I can pull the numbers prior to the 1st "." by doing this:

Regex num = new Regex(@"(?<number>.\d+) ?");
Match number = num.Match(numb);
if (number.Success)
    Console.WriteLine(num.Match(numb).Result("${number}")); 

This would give me everything prior to the period. I'm using the cheat sheet found here but it doesn't show how to match characters after the "." If I can figure out how to do that then I can just repeat the pattern until I get to the numbers I need, then I can use the above code to get rid of the numbers after. That may not be the most effective way to do it but I've never used regular expressions before and to be honest I find it very confusing.

EDIT:

I've been told I need to provide more examples or explain it better. I have a database table with a column called Glsec. This field contains numbers in two formats; 553000.468...705.46.0000000 and 553000.469.5501000.704.47.0000000 are examples of the two formats.

In the 553000.468...705.46.0000000 format I only need the numbers 703, 704, or 705 found in the first group of numbers AFTER the ... (14th character from the left)

In the 553000.469.5501000.704.47.0000000 format I only need the numbers 703, 704, or 705 found 4th group of numbers from the left (20th character).

The numbers in that group of three may contain any number between 000 and 999 but I only need the three numbers. It is also possible that 703, 704, 705 can randomly pop up in the other groups of numbers so I have to make sure I am grabbing the numbers from the correct position.

I hope that explains it better.

Upvotes: 0

Views: 2159

Answers (7)

afuzzyllama
afuzzyllama

Reputation: 6548

Use the Split() method on the .? Then you can get the number you want if it always has the same position or you can just loop through the array to find the numbers that you want.

Upvotes: 5

Barracoder
Barracoder

Reputation: 3764

If you get tired of regex....

    string[] formats = new[] {"553000.468...705.46.0000000", "553000.469.5501000.704.47.0000000"};
    var results = from format in formats
                  from sub in format.Split('.')
                  where new[] { "703","704","705" }.Contains(sub)
                  select sub;

Upvotes: 1

Colonel Panic
Colonel Panic

Reputation: 137622

Use String.Contains to test for the three cases. Assuming the string is the variable s:

s.Contains(".705.")
s.Contains(".706.")
s.Contains(".707.")

Upvotes: 2

gdoron
gdoron

Reputation: 150263

I only need the three digits in the middle that include 703, 704, or 705

You can use this:

@"\.(70[3-5])\."

If those are the only valid values.

BTW, who said regex is the best way to go with your problem?

Upvotes: 3

Guffa
Guffa

Reputation: 700432

You need to escape the period as \., as . means "any character" in a regular expression.

You can match a specific number of optional number + period before the number that you want:

Regex num = new Regex(@"^(?:\d*\.){3}(?<number>\d+)");

or after:

Regex num = new Regex(@"(?<number>\d+)(?:\.\d*){2}$");

Upvotes: 1

Feuerwehrmann
Feuerwehrmann

Reputation: 149

The . is a meta character for Regular Expressions, so It will need to be escaped. Is the number starting with 7 always 3 digits?

You can try the following:

   Regex num = new Regex(@"(?\.<number>\d+) ?");
   Match number = num.Match(numb);
   if (number.Success)
        Console.WriteLine(num.Match(numb).Result("${number}")); 

Upvotes: 1

Ben Voigt
Ben Voigt

Reputation: 283713

How about doing in in two steps. First, find two periods with three digits in between. Then, trim the periods.

Regex num = new Regex(@"\.\d{3}\.");
Match number = num.Match(numb);
if (number.Success)
    Console.WriteLine(number.Value.Trim('.')); 

You may also capture a subset of the match:

Regex num = new Regex(@"\.(\d{3})\.");
Match number = num.Match(numb);
if (number.Success)
    Console.WriteLine(number.Groups[1].Value); 

Upvotes: 1

Related Questions