mzbaran
mzbaran

Reputation: 624

Regex to extract variable number of key value pairs

Though I have found quite a few SO posts talking about using regex to extract key/value pairs, I was not able to find a solution for my particular use case.

I have key-value pairs like this:

{date=2020-07-22, labelId=100000004}

That will vary in the number of key/value pairs.

I would like to have a regular expression to extract these as keys and values"groups" like groups[1:] = "date", "2020-07-22", "labelID", "100000004

This sort of matches correctly for the first match,

([a-zA-Z0-9]+)=((?:[^\\\\\"]|\\\\.)*+)

...but I need a way to "split" on the comma

In regex gurus able to help me out with this one?

Thanks in advance.

Upvotes: 0

Views: 2176

Answers (3)

Ryszard Czech
Ryszard Czech

Reputation: 18631

You can do without a regex here:

text = '{date=2020-07-22, labelId=100000004}'
print(dict([x.split('=') for x in text.strip('{}').split(', ')]))

See Python proof.

That is, remove braces on both ends of the string, split with comma-space, and then split with =.

Results: {'date': '2020-07-22', 'labelId': '100000004'}

Upvotes: 1

Barmar
Barmar

Reputation: 781761

Use a pattern that excludes { and = from the key, and } and , from the value of each match.

import re

data = '{date=2020-07-22, labelId=100000004}'
regex = '([^{=]+)=([^,}]+)'

print(re.findall(regex, data))

Note that this regex doesn't allow for quoting the key and/or value to let them include the delimiter characters. That makes using regular expressions much more complicated.

Upvotes: 2

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627103

You can use

import re
text = "{date=2020-07-22, labelId=100000004}"
print(dict(re.findall(r'([a-zA-Z0-9]+)=([\d-]+)', text)))
# => {'date': '2020-07-22', 'labelId': '100000004'}

See the Python demo and the regex demo.

Regex details:

  • ([a-zA-Z0-9]+) - Group 1: any one or more letters or digits
  • = - a = char
  • ([\d-]+) - Group 2: one or more digits or -.

Upvotes: 1

Related Questions