Meep3D
Meep3D

Reputation: 3756

Validating items in CSV with regex

I have a CSV string that I am trying to validate via regex to ensure it only has N items. I've tried the following pattern (which look for 2 items):

/([^,]+){2}/

But it doesn't seem to work, I am guessing because the inner pattern isn't greedy enough.

Any ideas? Ideally it should work with both the PHP and Javscript regex engines.

Update:

For technical reasons I really want to do this via regex rather than another solution. The CSV is not quoted and the values will not contain commas, so that isn't a problem.

/([^,]*[,]{1}[^,]*){1}/

Is where I am at now, which sort of works but is still a bit ugly, and has issues matching one item.

CSV looks like:

apples,bananas,pears,oranges,grapefruit

Upvotes: 4

Views: 2419

Answers (7)

Evan Plaice
Evan Plaice

Reputation: 14140

Take a look at this answer.

To quote:

re_valid = r"""
# Validate a CSV string having single, double or un-quoted values.
^                                   # Anchor to start of string.
\s*                                 # Allow whitespace before value.
(?:                                 # Group for value alternatives.
  '[^'\\]*(?:\\[\S\s][^'\\]*)*'     # Either Single quoted string,
| "[^"\\]*(?:\\[\S\s][^"\\]*)*"     # or Double quoted string,
| [^,'"\s\\]*(?:\s+[^,'"\s\\]+)*    # or Non-comma, non-quote stuff.
)                                   # End group of value alternatives.
\s*                                 # Allow whitespace after value.
(?:                                 # Zero or more additional values
  ,                                 # Values separated by a comma.
  \s*                               # Allow whitespace before value.
  (?:                               # Group for value alternatives.
    '[^'\\]*(?:\\[\S\s][^'\\]*)*'   # Either Single quoted string,
  | "[^"\\]*(?:\\[\S\s][^"\\]*)*"   # or Double quoted string,
  | [^,'"\s\\]*(?:\s+[^,'"\s\\]+)*  # or Non-comma, non-quote stuff.
  )                                 # End group of value alternatives.
  \s*                               # Allow whitespace after value.
)*                                  # Zero or more additional values
$                                   # Anchor to end of string.
"""

Or the usable form (since JS can't handle multi-line regex strings):

var re_valid = /^\s*(?:'[^'\\]*(?:\\[\S\s][^'\\]*)*'|"[^"\\]*(?:\\[\S\s][^"\\]*)*"|[^,'"\s\\]*(?:\s+[^,'"\s\\]+)*)\s*(?:,\s*(?:'[^'\\]*(?:\\[\S\s][^'\\]*)*'|"[^"\\]*(?:\\[\S\s][^"\\]*)*"|[^,'"\s\\]*(?:\s+[^,'"\s\\]+)*)\s*)*$/;

It can be called using RegEx.test()

if (!re_valid.test(text)) return null;

The first match looks for valid single-quoted strings. The second match looks for valid double-quoted strings, the third looks for unquoted strings.

If you remove the single-quote matches it is an almost 100% implementation of a working IETF RFC 4810 spec CSV validator.

Note: It might be 100% but I can't remember whether it can handle newline chars in values (I think the [\S\s] is a javascript-specific hack to check for newline chars).

Note: This is a JavaScript-only implementation, there are no guarantees that the RegEx source string will work in PHP.

If you're planning on doing anything non-trivial with CSV data, I suggest you adopt an existing library. It gets pretty ugly if you're looking for a RFC-compliant implementation.

Upvotes: 0

Meep3D
Meep3D

Reputation: 3756

Got it.

/^([^,]+([,]{1}|$)){1}$/

Set the last {N} to the quantity of results or range {1,3} to check.

Upvotes: 0

KooiInc
KooiInc

Reputation: 122888

How about using the g (global) modifier to make the RegExp greedier?

var foobar = 'foo,bar',
    foobarbar = 'foo,bar,"bar"',
    foo = 'foo,',
    bar = 'bar';
foo.match(/([^,]+)/g).length === 2; //=> false
bar.match(/([^,]+)/g).length === 2; //=> false
foobar.match(/([^,]+)/g).length === 2; //=> true
foobarbar.match(/([^,]+)/g).length === 2; //=> false

Upvotes: 1

RobG
RobG

Reputation: 147343

Depending on how the CSV is formatted, it may be able to split on /\",\"/ (i.e. double_quote comma double_quote) and get the length of the resulting array.

Regular expressions aren't very good for parsing, so if the string is complex you may need to parse it some other way.

Upvotes: 0

bigblind
bigblind

Reputation: 12867

var vals       = "something,sthelse,anotherone,woohoo".split(','),
    maxlength = 4;

return vals.length<=maxlength

should work in js.

Upvotes: 0

Arjan
Arjan

Reputation: 9874

Untested, because I don't know what your input looks like:

/^([^,]+,){1}([^,]+$)/

This requires two fields (one comma, so no comma after the last field).

Upvotes: 1

Denis de Bernardy
Denis de Bernardy

Reputation: 78413

In PHP, you'll be much better off using this function:

http://www.php.net/manual/en/function.str-getcsv.php

It will deal with the likes of:

a,"b,c"

... which contains two items rather than three.

I'm not aware of an equivalent function for javascript.

Upvotes: 5

Related Questions