Roman
Roman

Reputation: 66156

What is the pattern for empty string?

I need to validate input: valid variants are either number or empty string. What is the correspondent regular expression?

String pattern = "\d+|<what shoudl be here?>";

UPD: dont suggest "\d*" please, I'm just curious how to tell "empty string" in regexp.

Upvotes: 10

Views: 18635

Answers (7)

polygenelubricants
polygenelubricants

Reputation: 383726

In this particular case, ^\d*$ would work, but generally speaking, to match pattern or an empty string, you can use:

^$|pattern

Explanation

  • ^ and $ are the beginning and end of the string anchors respectively.
  • | is used to denote alternates, e.g. this|that.

References

Related questions


Note on multiline mode

In the so-called multiline mode (Pattern.MULTILINE/(?m) in Java), the ^ and $ match the beginning and end of the line instead. The anchors for the beginning and end of the string are now \A and \Z respectively.

If you're in multiline mode, then the empty string is matched by \A\Z instead. ^$ would match an empty line within the string.


Examples

Here are some examples to illustrate the above points:

String numbers = "012345";

System.out.println(numbers.replaceAll(".", "<$0>"));
// <0><1><2><3><4><5>

System.out.println(numbers.replaceAll("^.", "<$0>"));
// <0>12345

System.out.println(numbers.replaceAll(".$", "<$0>"));
// 01234<5>

numbers = "012\n345\n678";
System.out.println(numbers.replaceAll("^.", "<$0>"));       
// <0>12
// 345
// 678

System.out.println(numbers.replaceAll("(?m)^.", "<$0>"));       
// <0>12
// <3>45
// <6>78

System.out.println(numbers.replaceAll("(?m).\\Z", "<$0>"));     
// 012
// 345
// 67<8>

Note on Java matches

In Java, matches attempts to match a pattern against the entire string.

This is true for String.matches, Pattern.matches and Matcher.matches.

This means that sometimes, anchors can be omitted for Java matches when they're otherwise necessary for other flavors and/or other Java regex methods.

Related questions

Upvotes: 18

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626738

To make any pattern that matches an entire string optional, i.e. allow a pattern match an empty string, use an optional group:

^(pattern)?$
^^       ^^^

See the regex demo

If the regex engine allows (as in Java), prefer a non-capturing group since its main purpose is to only group subpatterns, not keep the subvalues captured:

^(?:pattern)?$

The ^ will match the start of a string (or \A can be used in many flavors for this), $ will match the end of string (or \z can be used to match the very end in many flavors, and Java, too), and the (....)? will match 1 or 0 (due to the ? quantifier) sequences of the subpatterns inside parentheses.

A Java usage note: when used in matches(), the initial ^ and trailing $ can be omitted and you can use

String pattern = "(?:\d+)?";

Upvotes: 0

Konstantin Burlachenko
Konstantin Burlachenko

Reputation: 5665

One of the way to view at the set of regular language as the closure of the below things:

  1. Special < EMPTY_STRING > is the regular language
  2. Any symbol from alphaphet is the valid regular language
  3. Any concatentation and union of two valid regexps is the regular language
  4. Any union of two valid regular language is the regular language
  5. Any transitive closure of the regexp is the regular language

Concreate regular language is concrete element of this closure.


I didn't find empty symbol in POSIX standard to express regular language idea from step (1).

But it is exist extra thing like question mark there which is by posix definition is the following:

(regexp|< EMPTY_STRING >)

So you can do in the following manner for bash, perl, and python:

echo 9023 | grep -E "(1|90)?23"
perl -e "print 'PASS' if (qq(23) =~ /(1|90)?23/)"
python -c "import re; print bool(re.match('^(1|90)?23$', '23'))"

Upvotes: 0

unbeli
unbeli

Reputation: 30228

Just as a funny solution, you can do:

\d+|\d{0}

A digit, zero times. Yes, it does work.

Upvotes: 1

Tim Pietzcker
Tim Pietzcker

Reputation: 336108

To explicitly match the empty string, use \A\Z.

You can also often see ^$ which works fine unless the option is set to allow the ^ and $ anchors to match not only at the start or end of the string but also at the start/end of each line. If your input can never contain newlines, then of course ^$ is perfectly OK.

Some regex flavors don't support \A and \Z anchors (especially JavaScript).

If you want to allow "empty" as in "nothing or only whitespace", then go for \A\s*\Z or ^\s*$.

Upvotes: 3

KaptajnKold
KaptajnKold

Reputation: 10946

/^\d*$/

Matches 0 or more digits with nothing before or after.

Explanation:

The '^' means start of line. '$' means end of line. '*' matches 0 or more occurences. So the pattern matches an entire line with 0 or more digits.

Upvotes: 6

umop
umop

Reputation: 2192

There shouldn't be anything wrong with just "\d+|"

Upvotes: 0

Related Questions