Marcel Stör
Marcel Stör

Reputation: 23535

Match anything except character unless it's followed by some other character

I've got this odd string:

firstName:Paul Henry,retired:true,message:A, B & more,title:mr

which needs to be split into its <key>:<value> pairs. Unfortunately, key/value pairs are separated by , which itself can be part of the value. Hence, a simple string-split at , would not produce the correct result.

Keys contain only word characters and values can contain :.

What I need (I think) is something like

\w*:match-anything-but-comma-unless-comma-is-followed-by-space

What kind of works is

\w*:[\w ?!&%,]*(?![^,])

but of course I wouldn't want to explicitly list all characters in the character class (just listed a few for this example).

Upvotes: 2

Views: 166

Answers (2)

rob mayoff
rob mayoff

Reputation: 385670

You are trying to do something complicated with a regular expression that would be simple (and easy to understand) with a little code. That's usually a mistake. Just write a little code.

In your case, you want to split the input on commas. If you get a chunk that doesn't contain a colon, you want to treat it as part of the previous chunk. So just write that. For example, in Python, I'd do it like this:

chunks = input.split(',')
associations = []
for chunk in chunks:
    if ':' in chunk:
        associations.append(chunk)
    else:
        associations[-1] += ',' + chunk

map = dict(association.split(':') for association in associations)

Upvotes: 0

Ron Rosenfeld
Ron Rosenfeld

Reputation: 60224

If you want to split on a comma, unless the comma is followed by a space, why not just:

,(?=\S)

Not sure what language you are using, but in C# the line might look like:

splitArray = Regex.Split(subjectString, @",(?=\S)");

Upvotes: 3

Related Questions