JaneKealum
JaneKealum

Reputation: 43

What does this regular expression mean?

/\ATo\:\s+(.*)/

Also, how do you work it out, what's the approach?

Upvotes: 4

Views: 2089

Answers (6)

carlpett
carlpett

Reputation: 12583

First, you need to know what the different character classes and quantifiers are. Character classes are the backslash-prefixed characters, \A from your regex, for instance. Quantifiers are for instance the +. There are several references on the internet, for instance this one.

Using that, we can see what happens by going left to right:

  • \A matches a beginning of the string.
  • To matches the text "To" literally
  • \: escapes the ":", so it loses it's special meaning and becomes "just a colon"
  • \s matches whitespace (space, tab, etc)
  • + means to match the previous class one or more times, so \s+ means one or more spaces
  • () is a capture group, anything matched within the parens is saved for later use
  • . means "any character"
  • * is like the +, but zero or more times, so .* means any number of any characters

Taking that together, the regex will match a string beginning with "To:", then at least one space, and the anything, which it will save. So, with the string "To: JaneKealum", you'll be able to extract "JaneKealum".

Upvotes: 1

Simon G.
Simon G.

Reputation: 6707

The initial and trailing / characters delimit the regular expression.

A \ inside the expression means to treat the following character specially or treat it as a literal if it normally has a special meaning.

The \A means match only at the beginning of a string.

To means match the literal "To"

\: means match a literal ':'. A colon is normally a literal and has no special meaning it can be given.

\s means match a whitespace character.

+ means match as many as possible but at least one of whatever it follows, so \s+ means match one or more whitespace characters.

The ( and ) define a group of characters that will be captured and returned by the expression evaluator.

And finally the . matches any character and the * means match as many as possible but can be zero. Therefore the (.*) will capture all characters to the end of the input string.

So therefore the pattern will match a string that starts "To:" and capture all characters that occur after the first succeeding non-whitespace character.

The only way to really understand these things is to go through them one bit at a time and check the meaning of each component.

Upvotes: 0

mike3996
mike3996

Reputation: 17497

You start from left and look for any escaped (ie \A) characters. The rest are normal characters. \A means the start of the input. So To: must be matched at the very beginning of the input. I think the : is escaped for nothing. \s is a character group for all spaces (tabs, spaces, possibly newlines) and the + that follows it means you must have one or more space characters. After that you capture all the rest of the line in a group (marked with ( )).

If the input was

To:   progo@home

the capture group would contain "progo@home"

Upvotes: 0

paxdiablo
paxdiablo

Reputation: 881313

In multi-line regular expressions, \A matches the start of the string (and \Z is end of string, while ^/$ matches the start/end of the string or the start/end of a line). In single line variants, you just use ^ and $ for start and end of string/line since there is no distinction.

To is literal, \: is an escaped :.

\s means whitespace and the + means one or more of the preceding "characters" (white space in this case).

() is a capturing group, meaning everything in here will be stored in a "register" that you can use. Hence, this is the meat that will be extracted.

.* simply means any non newline character ., zero or more times *.

So, what this regex will do is process a string like:

To: paxdiablo
Re: you are so cool!

and return the text paxdiablo.

As to how to learn how to work this out yourself, the Perl regex tutorial(a) is a good start, and then practise, practise, practise :-)


(a) You haven't actually stated which regex implementation you're using but most modern ones are very similar to Perl. If you can find a specific tutorial for your particular flavour, that would obviously be better.

Upvotes: 4

Ignacio Vazquez-Abrams
Ignacio Vazquez-Abrams

Reputation: 798566

It matches To: at the beginning of the input, followed by at least one whitespace, followed by any number of characters as a group.

Upvotes: 0

Linus Kleen
Linus Kleen

Reputation: 34632

\A is a zero-width assertion and means "Match only at beginning of string".

The regex reads: On a line beginning with "To:" followed by one or more whitespaces (\s), capture the remainder of the line ((.*)).

Upvotes: 2

Related Questions