Reputation: 43
/\ATo\:\s+(.*)/
Also, how do you work it out, what's the approach?
Upvotes: 4
Views: 2089
Reputation: 12583
First, you need to know what the different character classes and quantifiers are. Character classes are the backslash-prefixed characters, \A
from your regex, for instance. Quantifiers are for instance the +
. There are several references on the internet, for instance this one.
Using that, we can see what happens by going left to right:
\A
matches a beginning of the string.To
matches the text "To" literally\:
escapes the ":", so it loses it's special meaning and becomes "just a colon"\s
matches whitespace (space, tab, etc)+
means to match the previous class one or more times, so \s+
means one or more spaces()
is a capture group, anything matched within the parens is saved for later use.
means "any character"*
is like the +
, but zero or more times, so .*
means any number of any charactersTaking that together, the regex will match a string beginning with "To:", then at least one space, and the anything, which it will save. So, with the string "To: JaneKealum", you'll be able to extract "JaneKealum".
Upvotes: 1
Reputation: 6707
The initial and trailing /
characters delimit the regular expression.
A \
inside the expression means to treat the following character specially or treat it as a literal if it normally has a special meaning.
The \A
means match only at the beginning of a string.
To
means match the literal "To"
\:
means match a literal ':'. A colon is normally a literal and has no special meaning it can be given.
\s
means match a whitespace character.
+
means match as many as possible but at least one of whatever it follows, so \s+
means match one or more whitespace characters.
The (
and )
define a group of characters that will be captured and returned by the expression evaluator.
And finally the .
matches any character and the *
means match as many as possible but can be zero. Therefore the (.*)
will capture all characters to the end of the input string.
So therefore the pattern will match a string that starts "To:" and capture all characters that occur after the first succeeding non-whitespace character.
The only way to really understand these things is to go through them one bit at a time and check the meaning of each component.
Upvotes: 0
Reputation: 17497
You start from left and look for any escaped (ie \A
) characters. The rest are normal characters. \A
means the start of the input. So To:
must be matched at the very beginning of the input. I think the :
is escaped for nothing. \s
is a character group for all spaces (tabs, spaces, possibly newlines) and the +
that follows it means you must have one or more space characters. After that you capture all the rest of the line in a group (marked with ( )
).
If the input was
To: progo@home
the capture group would contain "progo@home"
Upvotes: 0
Reputation: 881313
In multi-line regular expressions, \A
matches the start of the string (and \Z
is end of string, while ^
/$
matches the start/end of the string or the start/end of a line). In single line variants, you just use ^
and $
for start and end of string/line since there is no distinction.
To
is literal, \:
is an escaped :
.
\s
means whitespace and the +
means one or more of the preceding "characters" (white space in this case).
()
is a capturing group, meaning everything in here will be stored in a "register" that you can use. Hence, this is the meat that will be extracted.
.*
simply means any non newline character .
, zero or more times *
.
So, what this regex will do is process a string like:
To: paxdiablo
Re: you are so cool!
and return the text paxdiablo
.
As to how to learn how to work this out yourself, the Perl regex tutorial(a) is a good start, and then practise, practise, practise :-)
(a) You haven't actually stated which regex implementation you're using but most modern ones are very similar to Perl. If you can find a specific tutorial for your particular flavour, that would obviously be better.
Upvotes: 4
Reputation: 798566
It matches To:
at the beginning of the input, followed by at least one whitespace, followed by any number of characters as a group.
Upvotes: 0
Reputation: 34632
\A
is a zero-width assertion and means "Match only at beginning of string".
The regex reads: On a line beginning with "To:" followed by one or more whitespaces (\s
), capture the remainder of the line ((.*)
).
Upvotes: 2