Saqib
Saqib

Reputation: 2273

How to get Nth number from the specific position of a number in a string - Regex

288007  327920  374740 000368   044575  082865 680798
717374  755879  811106  855460  920577  953515  996819 ......

I have a string containing thousands of 6-digit numbers and I want to extract the Nth numbers after Nth number with the help of regular expression.

Let say I need to extract Three numbers after the 4th number then The result should be 044575 082865 680798.

another example If I need to extract 2 numbers after the 10th number then the result should be 855460 920577.

I don't know is this possible with regex, I think FOR EACH statement may be use in my case.

I am only able to extract each six digits number with the code below.

Dim NumberMatchCollection As MatchCollection = Regex.Matches("String containing numbers", "(?<!\d)\d{6}(?!\d)")
For Each NumberMatch As Match In NumberMatchCollection

   Dim ItemNumber As String = NumberMatch.Value

Next

Edited: I can not guarantee that every separator character will be a single space, a double space, a tab or something else. I can just guarantee that the number length always will be 6 which will be separated by space(s) or tab(s).

Upvotes: 0

Views: 1507

Answers (4)

Andrew Morton
Andrew Morton

Reputation: 25013

You can .Split() the string and use LINQ extension methods on the resulting array:

// some test data...
var rand = new Random();
StringBuilder sb = new StringBuilder();
for (int i = 1; i <= 10000; i++)
{
    sb.Append(i.ToString("000000") + ((rand.Next(5)==1) ? "  ": "\t"));
}
string s = sb.ToString();

string portion = string.Join("  ", s.Split(new [] {' ', '\t'}, StringSplitOptions.RemoveEmptyEntries).Skip(10).Take(3));

Console.WriteLine(portion); // outputs "000011  000012  000013"

Note: for the first number you would .Skip(0).

But if your string is in the rigid format you show (asuming the variable numbers of spaces are typos, thanks @ErikE), Coenraad's method of calculating where the start of the required string is and how many characters to take would be more efficient. I'll leave it to Coenraad to expand on that answer as it would not be fair to possibly take the points.

I tried and tried to make the regex method be consistently fast, but I found it depended strongly on which numbers you want to retrieve:

enter image description here

For anyone wanting to test that, I put a default Chart on a Form and used this code:

Imports System.Text
Imports System.Text.RegularExpressions
Imports System.Windows.Forms.DataVisualization
Imports System.Windows.Forms.DataVisualization.Charting

Public Class Form1

    Sub DoStuff()

        Dim ser1 As New Series With {.Name = "String.Split"}
        Dim ser2 As New Series With {.Name = "RegEx"}

        Dim sb As New StringBuilder()

        For i As Integer = 1 To 10000
            sb.Append(i.ToString("000000") + "  ")
        Next
        Dim s As String = sb.ToString()

        Dim sw As New Stopwatch()

        Dim itemsToTake As Integer = 50

        For firstItem = 1 To 9000 Step 100

            sw.Restart()

            Dim portion As String = String.Join(" ", s.Split({" "c}, StringSplitOptions.RemoveEmptyEntries).Skip(firstItem - 1).Take(itemsToTake))

            sw.Stop()
            ser1.Points.AddXY(firstItem -1, sw.ElapsedTicks)

            Dim pattern = "^(?:\d+\s+){" + (firstItem - 1).ToString() + "}((\d+)\s+){" + itemsToTake.ToString() + "}"
            Dim re = New Regex(pattern)

            sw.Restart()
            Dim matches = re.Matches(s)
            Dim cs = matches(0).Groups(0).Captures
            sw.Stop()
            ser2.Points.AddXY(firstItem - 1, sw.ElapsedTicks)

        Next

        Chart1.Series.Clear()
        Chart1.Series.Add(ser1)
        Chart1.Series(0).ChartType = SeriesChartType.Line
        Chart1.Series.Add(ser2)
        Chart1.Series(1).ChartType = SeriesChartType.Line

        Chart1.ChartAreas(0).AxisX.IsMarginVisible = False
        Chart1.ChartAreas(0).AxisX.Title = "First item to retrieve"
        Chart1.ChartAreas(0).AxisY.Title = "Time taken"

    End Sub

    Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
        DoStuff()

    End Sub

End Class

Upvotes: 0

the_lotus
the_lotus

Reputation: 12748

To expand on my comment. This assume that the actual data are divided equaly.

If each number have 6 digits with a space in between. Then the position of the 4th number will be (6+1)*4 and if you want 3 numbers than you just need to fetch (6+1)*3 amount of characters.

    Dim str As String

    str = "288007 327920 374740 000368 044575 082865 680798 717374 755879 811106 855460 920577 953515 996819"

    Dim startingNumber As Integer = 4
    Dim amountToFetch As Integer = 3

    ' 7 = [size of each number] + [delimiter length]
    ' 7 = 6 + 1

    Console.WriteLine(str.Substring(7 * startingNumber, 7 * amountToFetch))
    Console.ReadLine()

Upvotes: 1

Les
Les

Reputation: 10605

If you would like a regex and c# solution, the following code does the 3 numbers after 4th number example.

        var st = @"288007  327920  374740 000368   044575  082865 680798
                  717374  755879  811106  855460  920577  953515  996819";
        var pattern = @"^(\d+\s+){4}((?<x>\d+)\s+){3}";
        var matches = Regex.Matches(st,pattern,RegexOptions.Singleline);
        foreach (Capture m in matches[0].Groups["x"].Captures)
            Console.WriteLine("value={0}", m.Value);

(Edit: removed one group per comment below)

Upvotes: 0

cheekibreeki
cheekibreeki

Reputation: 113

Wouldn't this be simpler using maths?

Three numbers after the 4th number, is chars (7 * 4) + (7 * 3)

Upvotes: 2

Related Questions