Reputation: 14931
This is a collection of common Q&A. This is also a Community Wiki, so everyone is invited to participate in maintaining it.
regex is suffering from give me ze code type of questions and poor answers with no explanation. This reference is meant to provide links to quality Q&A.
This reference is meant for the following languages: php, perl, javascript, python, ruby, java, .net.
This might be too broad, but these languages share the same syntax. For specific features there's the tag of the language behind it, example:
Upvotes: 52
Views: 217338
Reputation: 20163
See also a lot of general hints and useful links at the regex tag details page.
Online tutorials
Quantifiers
*
: greedy, *?
: reluctant, *+
: possessive+
: greedy, +?
: reluctant, ++
: possessive?
: greedy, ??
: reluctant, ?+
: possessive{n,m}
: between n & m, {n,}
: n-or-more, {n}
: exactly n{n}
and {n}?
Character Classes
[...]
: any one character, [^...]
: negated/any character but[^]
matches any one character including newlines javascript[\w-[\d]]
/ [a-z-[qz]]
: set subtraction .net, xml-schema, xpath, JGSoft[\w&&[^\d]]
: set intersection java, ruby 1.9+, javascript (with v
flag)[[:alpha:]]
:POSIX character classes[[:<:]]
and [[:>:]]
Word boundaries[^\\D2]
, [^[^0-9]2]
, [^2[^0-9]]
get different results in Java? java\d
:digit, \D
:non-digit\w
:word character, \W
:non-word character\s
:whitespace, \S
:non-whitespace\p{L}, \P{L}
, etc.)Escape Sequences
\h
:space-or-tab, \t
:tab\H
:Non horizontal whitespace character, \V
:Non vertical whitespace character, \N
:Non line feed character pcre php5 java-8\v
:vertical tab, \e
:the escape characterAnchors
anchor | matches | flavors |
---|---|---|
^ |
Start of string | Common* |
^ |
Start of line | Commonm |
$ |
End of line | Commonm |
$ |
End of text | Common* except javascript |
$ |
Very end of string | javascript*, phpD |
\A |
Start of string | Common except javascript |
\Z |
End of text | Common except javascript python |
\Z |
Very end of string | python |
\z |
Very end of string | Common except javascript python |
\b |
Word boundary | Common |
\B |
Not a word boundary | Common |
\G |
End of previous match | Common except javascript, python |
Term | Definition |
---|---|
Start of string | At the very start of the string. |
Start of line | At the very start of the string, and after a non-terminal line terminator. |
Very end of string | At the very end of the string. |
End of text | At the very end of the string, and at a terminal line terminator. |
End of line | At the very end of the string, and at a line terminator. |
Word boundary | At a word character not preceded by a word character, and at a non-word character not preceded by a non-word character. |
End of previous match | At a previously set position, usually where a previous match ended. At the very start of the string if no position was set. |
"Common" refers to the following: icu java javascript .net objective-c pcre perl php python swift ruby
* Default |
m
Multi-line mode. |
D
Dollar end only mode.
Groups
(...)
:capture group, (?:)
:non-capture group
\1
:backreference and capture-group reference, $1
:capture group reference
(?i:regex)
mean?(?P<group_name>regexp)
mean?(?>)
:atomic group or independent group, (?|)
:branch reset
regular-expressions.info
(?<groupname>regex)
: Overview and naming rules (Non-Stack Overflow links)(?P<groupname>regex)
python, (?<groupname>regex)
.net, (?<groupname>regex)
perl, (?P<groupname>regex)
and (?<groupname>regex)
php(?<-foo>)
: balancing groups .netLookarounds
(?=...)
:positive, (?!...)
:negative(?<=...)
:positive, (?<!...)
:negativeModifiers
flag | modifier | flavors |
---|---|---|
a |
ASCII | python |
c |
current position | perl |
e |
expression | php perl |
g |
global | most |
i |
case-insensitive | most |
m |
multiline | php perl python javascript .net java |
m |
(non)multiline | ruby |
o |
once | perl ruby |
r |
non-destructive | perl |
S |
study | php |
s |
single line | ruby |
U |
ungreedy | php r |
u |
unicode | most |
x |
whitespace-extended | most |
y |
sticky ↪ | javascript |
^
in (?^:…)
mean in the string form of a Perl qr// Regex?Other:
|
:alternation (OR) operator, .
:any character, [.]
:literal dot character(*PRUNE)
, (*SKIP)
, (*FAIL)
and (*F)
(?R)
, (?0)
and (?1)
, (?-1)
, (?&groupname)
Common Tasks
{...}
Advanced Regex-Fu
(?!a)a
this
except in contexts A, B and CFlavor-Specific Information
(Except for those marked with *
, this section contains non-Stack Overflow links.)
java.util.regex.Matcher
:
matches()
): The match must be anchored to both input-start and -endfind()
): A match may be anywhere in the input string (substrings)lookingAt()
: The match must be anchored to input-start onlyjava.lang.String
functions that accept regular expressions: matches(s)
, replaceAll(s,s)
, replaceFirst(s,s)
, split(s)
, split(s,i)
java.util.regex
preg_match
search
vs match
, how-toregex
, struct regex::Regex
regexp
commandGeneral information
(Links marked with *
are non-Stack Overflow links.)
Examples of regex that can cause regex engine to fail
Tools: Testers and Explainers
(This section contains non-Stack Overflow links.)
Online (* includes replacement tester, + includes split tester):
freeformatter.com
xregexpregex.larsolavtorvik.com
php PCRE and POSIX, javascriptOffline:
MySQL 8.0: Various syntax changes were made. Note especially the doubling of backslashes in some contexts. (This Answer need further editing to reflect the differences.)
Upvotes: 1230