cjohnson
cjohnson

Reputation: 55

Can't get regex to find specific string and associated numbers

I am writing a piece of code that scans public company tax files (.txt files) and pulls out information. I am trying to find certain strings and then grab the information that follows it. At this time though I am just trying to find the strings. My regex code is:

 Regex regCIK = new Regex(@"\s^CENTRAL INDEX KEY:$\s\d+");
 string[] lines = File.ReadAllLines(fileName);
 foreach (string line in lines)
      {
           foreach (Match match in regCIK.Matches(line))
               Console.WriteLine(match);
      }

I'm just looking to find a match and then write it to the console for now to make sure I actually get it.

I've been trying to get the regex right using https://regex101.com/, but can't figure it out.

The line in the text file I am trying to get looks like this:

        CENTRAL INDEX KEY:          ??????????

With the ? being digits 0-9.

Upvotes: 2

Views: 54

Answers (2)

Josh R
Josh R

Reputation: 2210

Regex's are tough to get right.

The carat symbol ^ doesn't signify starting to look for a match, it means only match this when it's the start of a string. Same with the $ , it means only match me when the string ends after all of this.

The regex below will matches CENTRAL INDEX KEY: 1234567890 dead on.

Walking through the regex:

  1. We're only looking for "CENTRAL INDEX KEY:" without the quotes to start our match
  2. Then we're okay with any amount of whitespace between that phrase and whatever comes next. That's the \s (whitespace) followed by * which means 0 or more of whatever it right before it, aka the \s
  3. Lastly we're looking for any digit, and it has to be 10 of them. That is the \d identifier followed by the number of them you want in braces {10}. If we wanted 8 we would have done \d{8} or 12 is \d{12}

Regex regCIK = new Regex(@"CENTRAL INDEX KEY:\s*\d{10}");

Upvotes: 0

lc.
lc.

Reputation: 116458

^ and $ match the beginning and end of a line, respectively, and are most likely not what you're looking for. Remove them (and allow for multiple spaces with a *) and it should match:

Regex regCIK = new Regex(@"\s*CENTRAL INDEX KEY:\s*\d+");

In fact, you don't need the opening spaces either:

Regex regCIK = new Regex(@"CENTRAL INDEX KEY:\s*\d+");

Upvotes: 2

Related Questions