Kevin
Kevin

Reputation: 182

Regex to Parse Comma Separated Key-Value Pairs With Commas in Value

I have a comma separated list that I need to parse into key-value pairs. Here is what the string looks like:

TeacherName='Yoda',TeacherIsJedi=TRUE,TeachersAidNames=('Mace'),TeachersAidAlive=(FALSE),TeachersAidAges=(72),NumberOfStudents=3,StudentAges=(42,59,19)

The patterns it can be broken down into:

  1. Single String value

    TeacherName='Yoda'
    
  2. Single Boolean/numeric value

    TeacherIsJedi=TRUE
    NumberOfStudents=3
    
  3. String Arrays (sometimes with one value)

    TeachersAidNames=('Mace')
    StudentNames=('Anakin','Obi Wan','Luke')
    
  4. Boolean/numeric arrays (sometimes with one value)

    TeachersAidAlive=(FALSE) 
    TeachersAidAges=(72)
    StudentAges=(42,59,19)
    

Keys are alphanumeric with no spaces.

I cannot just split on commas due to their potential inclusion in strings and as a separator for the arrays, and felt regex might be a good solution to get each key-value pair that I can then manipulate further.

My understanding of greedy/lazy is limited and it seems like I either match everything after the first = or am only able to match each key without the value. My latest attempt:

,?\w*=\(?.*?\)?

Can someone walk me through a regex pattern that will allow me to match all of these key/value pairs?

Upvotes: 2

Views: 1728

Answers (1)

doom87er
doom87er

Reputation: 468

Regex101 helps show it in action https://regex101.com/r/BzyfpN/3

this may help:

(?<pair>(?<key>.+?)(?:=)(?<value>[^=]+)(?:,|$))

only capture if a pair has a "key" and a "value"

(?<pair>(?<key>... )... (?<value>... )

the +? is a lazy quantifier which will make the smallest match it can and then expand to make the largest continuous match from there, since the capture group "pair" requires "key" to end with a '=' its essentially saying: "match everything from here to the start of the first '=' character is a "key"

(?<key>.+?)(:?=)

this works pretty much the same way, only instead of using lazy + im using greedy + which will make the largest match it can. so .+(?:,|$) would say: match everything from here to the start of the last ',' or end of String. Which is almost what we want, but there is nothing to stop it from including the next "pair" in that match. So, I excluded the '=' character from the greedy quantifier, since I know that a '=' will be a part of every "pair" it will match every character excluding the tailling ',' of the "pair"

(?<value>[^=]+)(?:,|$)

Upvotes: 2

Related Questions