Reputation: 55
I am writing a piece of code that scans public company tax files (.txt files) and pulls out information. I am trying to find certain strings and then grab the information that follows it. At this time though I am just trying to find the strings. My regex code is:
Regex regCIK = new Regex(@"\s^CENTRAL INDEX KEY:$\s\d+");
string[] lines = File.ReadAllLines(fileName);
foreach (string line in lines)
{
foreach (Match match in regCIK.Matches(line))
Console.WriteLine(match);
}
I'm just looking to find a match and then write it to the console for now to make sure I actually get it.
I've been trying to get the regex right using https://regex101.com/, but can't figure it out.
The line in the text file I am trying to get looks like this:
CENTRAL INDEX KEY: ??????????
With the ? being digits 0-9.
Upvotes: 2
Views: 54
Reputation: 2210
Regex's are tough to get right.
The carat symbol ^
doesn't signify starting to look for a match, it means only match this when it's the start of a string. Same with the $
, it means only match me when the string ends after all of this.
The regex below will matches CENTRAL INDEX KEY: 1234567890
dead on.
Walking through the regex:
Regex regCIK = new Regex(@"CENTRAL INDEX KEY:\s*\d{10}");
Upvotes: 0
Reputation: 116458
^
and $
match the beginning and end of a line, respectively, and are most likely not what you're looking for. Remove them (and allow for multiple spaces with a *
) and it should match:
Regex regCIK = new Regex(@"\s*CENTRAL INDEX KEY:\s*\d+");
In fact, you don't need the opening spaces either:
Regex regCIK = new Regex(@"CENTRAL INDEX KEY:\s*\d+");
Upvotes: 2