Reputation: 2273
288007 327920 374740 000368 044575 082865 680798
717374 755879 811106 855460 920577 953515 996819 ......
I have a string containing thousands of 6-digit numbers and I want to extract the Nth numbers after Nth number with the help of regular expression.
Let say I need to extract Three numbers after the 4th number then The result should be 044575 082865 680798
.
another example If I need to extract 2 numbers after the 10th number then the result should be 855460 920577
.
I don't know is this possible with regex, I think FOR EACH
statement may be use in my case.
I am only able to extract each six digits number with the code below.
Dim NumberMatchCollection As MatchCollection = Regex.Matches("String containing numbers", "(?<!\d)\d{6}(?!\d)")
For Each NumberMatch As Match In NumberMatchCollection
Dim ItemNumber As String = NumberMatch.Value
Next
Edited: I can not guarantee that every separator character will be a single space, a double space, a tab or something else. I can just guarantee that the number length always will be 6 which will be separated by space(s) or tab(s).
Upvotes: 0
Views: 1507
Reputation: 25013
You can .Split() the string and use LINQ extension methods on the resulting array:
// some test data...
var rand = new Random();
StringBuilder sb = new StringBuilder();
for (int i = 1; i <= 10000; i++)
{
sb.Append(i.ToString("000000") + ((rand.Next(5)==1) ? " ": "\t"));
}
string s = sb.ToString();
string portion = string.Join(" ", s.Split(new [] {' ', '\t'}, StringSplitOptions.RemoveEmptyEntries).Skip(10).Take(3));
Console.WriteLine(portion); // outputs "000011 000012 000013"
Note: for the first number you would .Skip(0)
.
But if your string is in the rigid format you show (asuming the variable numbers of spaces are typos, thanks @ErikE), Coenraad's method of calculating where the start of the required string is and how many characters to take would be more efficient. I'll leave it to Coenraad to expand on that answer as it would not be fair to possibly take the points.
I tried and tried to make the regex method be consistently fast, but I found it depended strongly on which numbers you want to retrieve:
For anyone wanting to test that, I put a default Chart on a Form and used this code:
Imports System.Text
Imports System.Text.RegularExpressions
Imports System.Windows.Forms.DataVisualization
Imports System.Windows.Forms.DataVisualization.Charting
Public Class Form1
Sub DoStuff()
Dim ser1 As New Series With {.Name = "String.Split"}
Dim ser2 As New Series With {.Name = "RegEx"}
Dim sb As New StringBuilder()
For i As Integer = 1 To 10000
sb.Append(i.ToString("000000") + " ")
Next
Dim s As String = sb.ToString()
Dim sw As New Stopwatch()
Dim itemsToTake As Integer = 50
For firstItem = 1 To 9000 Step 100
sw.Restart()
Dim portion As String = String.Join(" ", s.Split({" "c}, StringSplitOptions.RemoveEmptyEntries).Skip(firstItem - 1).Take(itemsToTake))
sw.Stop()
ser1.Points.AddXY(firstItem -1, sw.ElapsedTicks)
Dim pattern = "^(?:\d+\s+){" + (firstItem - 1).ToString() + "}((\d+)\s+){" + itemsToTake.ToString() + "}"
Dim re = New Regex(pattern)
sw.Restart()
Dim matches = re.Matches(s)
Dim cs = matches(0).Groups(0).Captures
sw.Stop()
ser2.Points.AddXY(firstItem - 1, sw.ElapsedTicks)
Next
Chart1.Series.Clear()
Chart1.Series.Add(ser1)
Chart1.Series(0).ChartType = SeriesChartType.Line
Chart1.Series.Add(ser2)
Chart1.Series(1).ChartType = SeriesChartType.Line
Chart1.ChartAreas(0).AxisX.IsMarginVisible = False
Chart1.ChartAreas(0).AxisX.Title = "First item to retrieve"
Chart1.ChartAreas(0).AxisY.Title = "Time taken"
End Sub
Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
DoStuff()
End Sub
End Class
Upvotes: 0
Reputation: 12748
To expand on my comment. This assume that the actual data are divided equaly.
If each number have 6 digits with a space in between. Then the position of the 4th number will be (6+1)*4 and if you want 3 numbers than you just need to fetch (6+1)*3 amount of characters.
Dim str As String
str = "288007 327920 374740 000368 044575 082865 680798 717374 755879 811106 855460 920577 953515 996819"
Dim startingNumber As Integer = 4
Dim amountToFetch As Integer = 3
' 7 = [size of each number] + [delimiter length]
' 7 = 6 + 1
Console.WriteLine(str.Substring(7 * startingNumber, 7 * amountToFetch))
Console.ReadLine()
Upvotes: 1
Reputation: 10605
If you would like a regex and c# solution, the following code does the 3 numbers after 4th number example.
var st = @"288007 327920 374740 000368 044575 082865 680798
717374 755879 811106 855460 920577 953515 996819";
var pattern = @"^(\d+\s+){4}((?<x>\d+)\s+){3}";
var matches = Regex.Matches(st,pattern,RegexOptions.Singleline);
foreach (Capture m in matches[0].Groups["x"].Captures)
Console.WriteLine("value={0}", m.Value);
(Edit: removed one group per comment below)
Upvotes: 0
Reputation: 113
Wouldn't this be simpler using maths?
Three numbers after the 4th number, is chars (7 * 4) + (7 * 3)
Upvotes: 2