Reputation: 2044
I am not great with regular expressions and I have a need to parse out key/value pairs from a string. An example of the string would be:
Event Name CallingNumber:+15555555555 CallID:12345 CallingName:Doe, John CallingTime:12-26-2013 14:27:41.645497
The result I'm looking for would be something like this:
CallingNumber=+15555555555
CallID=12345
CallingName=Doe, John
CallingTime=12-26-2013 14:27:41.645497
The key/value pairs are delimited by a space, but the value is allowed to have a space in it (ex: Doe, John). It would be nice if the values were surrounded by quotes or something, but they are not. Essentially I'm trying to match a word without a space followed by a colon and then any character after the colon until it reaches another word without a space followed by a colon.
Upvotes: 2
Views: 2135
Reputation: 17139
Your match is impossible, the fields are delimited with :
but you have a date with :
in there, as well, Regex can't really distinguish those very easily.
Still, this is what I came up with:
(.+?):(.+?)(?=(?:[^\s]+:)|(?:$))
Again, beacuse of the date, this won't work perfectly.
Here's a fiddle to demonstrate: http://www.rexfiddle.net/Wm3NiK0
Edit: If your "keys" are only letters (not numbers), which avoids the time/date problem, then this will work:
([A-Za-z]+?):(.+?)\s?(?=(?:[A-Za-z]+:)|(?:$))
Here's another fiddle to demonstrate this: http://www.rexfiddle.net/sGQs7YV
Upvotes: 2
Reputation: 13914
You can apply the regex repeatedly, with a (.*) to return the "yet to be parsed" remainder
In pseudocode form, this might be:
match string to "^(([^:]*\s)*[^:]*)\s+(.*)$"
should grab "Event Name" and leave the rest as $3
loop:
keep only $3 as new base string
match new base string to "^(\w+)[:](.+?)\s+(\w+[:].*)$"
key = $1, value = $2, new remainder = $3
repeat until no $1, $2 values are returned
Upvotes: 1
Reputation: 2130
"I'm suing .NET (c#)," good idea! :) Microsoft needs to be put in its place!
Do you have a fixed number of fields, or could they vary in number? Do you expect the same fields each time? In the same order? If a fixed number, you could hard code the number of fields in the regexp, but I still think that trying to do it with just one regexp is asking for a headache. Use some scripting code and break it down piece by piece, first of all splitting it on :\s+. The last word in a group is then stripped off as the name of the next group, and the remainder is the value of the previous group. The first and last groups have to have some special treatment. I think that would be a lot easier and more understandable than trying to do it in one ugly regexp. As a bonus, any number of fields in any order could be handled.
Upvotes: 0