riqitang
riqitang

Reputation: 3371

Is this possible with regex?

I'm parsing a file where I want to extract a certain string.

The string will be preceeded by some length of white space, followed by either:

or

followed by a carriage return and newline.

Is it possible for me to make an expression that is equivalent to "if the character is H, then skip 8 characters, else if the character is a G then skip 9 characters" or even more simply "if the character is an H, skip 8 characters, else skip 9 characters".

The current regex I have that works well with H is @"\s+H.{8}(?<user>.*)\r\n", but I'm stumped when it comes to adding conditional character counts. For instance, it'd be really nice if there were some syntax like [H|G].{8|9}, but I don't think this actually exists in regex syntax.

Upvotes: 2

Views: 159

Answers (4)

ΩmegaMan
ΩmegaMan

Reputation: 31656

This does two if conditions. Use Regex option IgnorePatternWhitespace to allow commenting

(?(H0[xX][0-9a-fA-F]{6}[^\r\n\d]+)     # If an H with 8 hex digits is found
     H.{8}(?<User>[^\r\n\d]+)          # Then match the H user
    |                                  # else
    (?(G0[xX][0-9a-fA-F]{7}[^\r\n\d]+) # If G with 9 hex is found
      G.{9}(?<User>[^\r\n\d]+))        # Then match the G User
 )

Update

The Achilles heal is that it is unclear what a username consists of...if the username has a digit say 1OmegaMan...this will fail. But the OP has not specified that rule, nor given any clear examples.

So the assumption here is that a username is all alphabetic characters.

A better pattern to search for might be H\d{8}[A-Z][^\r\n]+ which says that at least one alphabetic character is present after the digits which delineates the username from the digits.

Upvotes: 0

T. Kiley
T. Kiley

Reputation: 2802

As per my comment, I just elaborated on yours to get

\s+((H.{8})|(G.{9}))(?<user>.*)\r\n

Regular expression visualization

Debuggex Demo

Since Regex corresponds to Finte State Automata, it is easy to see why this is trivial, on reading an H we go in to one state, G in to the other.

Upvotes: 4

Sriram Sakthivel
Sriram Sakthivel

Reputation: 73482

Well it is possible with Regex. You can use conditions in regex.

Here is the main part of "Regex" you're struggling with. I assume you could build with this.

var subject = "H12345678ABC";
var regex = new Regex(@"(?((?<hgroup>H))\k<hgroup>.{8}|.{10})(?<user>.*)");
var match =regex.Match(subject);
if(match.Success)
{
    Console.WriteLine(match.Groups["user"].Value);//prints ABC
}
else
{
    Console.WriteLine("No Match");
}

Break up:

(?<hgroup>H)    Matches H and stores in group hgroup
\k<hgroup>.{8}  If true checks matches H followed by any 8 characters
.{10}           If not then match next 10 characters(G followed by 9 other characters)
(?<user>.*)     Captures rest all to user group

Here is a working Demo

Upvotes: 1

Toto
Toto

Reputation: 91428

I'd use:

\s+(?:H[a-fA-F0-9]{8}|G[a-fA-F0-9]{9})(?<user>.*)\r\n

Upvotes: 2

Related Questions