Ivan
Ivan

Reputation: 67

Regex to match comma separated values

I'm new to Regex in Java and I wanted to know how can I build one that only takes a string that consists of one or two comma-separated lists of uppercase letters, separated by a single whitespace.

I would need to filter out strings that start with a comma, that end with a comma or strings that have multiple consecutive commas.

All these would be invalid:

All these would be valid:

I used (\s?("[\w\s]*"|\d*)\s?(,,|$)) for consecutive commas but it doesn't do the trick when the comma is at the end or beggining of one of the whitespace separated substring like "D, ,D"

Should I aim to split by whitespace and look for a simpler regex for each of the substrings?

Upvotes: 1

Views: 3595

Answers (3)

MC Emperor
MC Emperor

Reputation: 22977

That would be something like this:

^[A-Z](,[A-Z])*( [A-Z](,[A-Z])*)*$

What happens here, is the following:

  • We expect a letter, optionally followed by one or more times a comma-immediately-followed-by-another-letter.
  • Then we optionally accept a space, and then the abovementioned pattern. And this is repeated.

Test: https://regex101.com/r/kzLhtw/1

You could, of course, slightly optimize the regex by making all capturing groups non-capturing: just put ?: immediately behind the (, that is, (?:.

Upvotes: 3

JvdV
JvdV

Reputation: 75840

"a string that consists of one or two comma-separated lists of uppercase letters, separated by a single whitespace"

Not sure how to exactly interpretate the above, but my reading is: One or two comma-seperated lists where each list may only consist of uppercase characters. In the case of two lists, the two lists are seperated by a single space.

You could try:

^(?!.* .* )[A-Z](?:[ ,][A-Z])*$

See the online demo

  • ^ - Start string anchor.
  • (?!.* .* ) - Negative lookahead to prevent two spaces present.
  • [A-Z] - A single uppercase alpha-char.
  • (?: - Open non-capture group:
    • [ ,] - A comma or space.
    • [A-Z] - A single uppercase alpha-char.
    • )* - Close non-capture group and match 0+ times upt to;
  • $ - End string anchor.

Upvotes: 1

The fourth bird
The fourth bird

Reputation: 163217

You might use

^[A-Z](?: [A-Z])*(?:,[A-Z](?: [A-Z])*){0,2}$
  • ^ Start of string
  • [A-Z] Match a single char A-Z
  • (?: [A-Z])* Optionally repeat a space and and a single char A-Z
  • (?: Non capture group
    • ,[A-Z](?: [A-Z])* Match a comma, char A-Z followed by optionally repeat matching a space and a char A-Z
  • ){0,2} Close the group and repeat 0-2 times
  • $ End of string

Regex demo

Upvotes: 2

Related Questions