Reputation: 23535
I've got this odd string:
firstName:Paul Henry,retired:true,message:A, B & more,title:mr
which needs to be split into its <key>:<value>
pairs. Unfortunately, key/value pairs are separated by ,
which itself can be part of the value. Hence, a simple string-split at ,
would not produce the correct result.
Keys contain only word characters and values can contain :
.
What I need (I think) is something like
\w*:match-anything-but-comma-unless-comma-is-followed-by-space
What kind of works is
\w*:[\w ?!&%,]*(?![^,])
but of course I wouldn't want to explicitly list all characters in the character class (just listed a few for this example).
Upvotes: 2
Views: 166
Reputation: 385670
You are trying to do something complicated with a regular expression that would be simple (and easy to understand) with a little code. That's usually a mistake. Just write a little code.
In your case, you want to split the input on commas. If you get a chunk that doesn't contain a colon, you want to treat it as part of the previous chunk. So just write that. For example, in Python, I'd do it like this:
chunks = input.split(',')
associations = []
for chunk in chunks:
if ':' in chunk:
associations.append(chunk)
else:
associations[-1] += ',' + chunk
map = dict(association.split(':') for association in associations)
Upvotes: 0
Reputation: 60224
If you want to split on a comma, unless the comma is followed by a space, why not just:
,(?=\S)
Not sure what language you are using, but in C# the line might look like:
splitArray = Regex.Split(subjectString, @",(?=\S)");
Upvotes: 3