Aditya
Aditya

Reputation: 571

match key-value pair

In the following text, I want to extract the keys with their values. I've written the following regex but it does not matches the values across multiple lines. regex: --(.*)=.*(?=(.|--|\n|\Z)*)

--some text here not to be matched
--key1=this is a
 multiline statement
 statement
--random text not to be matched
--key2=val2
--key3=val3
--random text here not to be matched

So, after matching the output should be

--key1=this is a
 multiline statement
 statement
--key2=val2
--key3=val3

Upvotes: 1

Views: 101

Answers (3)

Haleemur Ali
Haleemur Ali

Reputation: 28303

Perhaps the OP provided a simplistic example and in actual code, regex will be required, but the example above can be filtered without regex

The central insight in this method of filtering out the junk lines is to remove all lines that start with -- but doesn't contain =.

text = """--some text here not to be matched
   --key1=this is a
    multiline statement
    statement
   --random text not to be matched
   --key2=val2
   --key3=val3
   --random text here not to be matched"""

valid_lines = [l for l in text.split('\n') if not (l.startswith('--') and '=' not in l)]

result = '\n'.join(valid_lines)

print(result)
# output
--key1=this is a
 multiline statement
 statement
--key2=val2
--key3=val3

to construct a dictionary out of the result text:

mydata = {data.split('=')[0]:data.split('=')[1].strip('\n') for data in result.strip('-').split('--')}
print(mydata)
#outputs:
{'key1': 'this is a\n multiline statement\n statement', 'key2': 'val2', 'key3': 'val3'}

Upvotes: 0

c2huc2hu
c2huc2hu

Reputation: 2497

Ajax's answer will fail if any of the values contain -. Instead, do a negative lookaround to ensure that the vals do not contain --.

This regex will do that: --.+=((?!--)[\S\s])+

Regex101 link

Upvotes: 1

Ajax1234
Ajax1234

Reputation: 71461

You can try this:

import re
s = """
 --some text here not to be matched
 --key1=this is a
 multiline statement
 statement
 --random text not to be matched
 --key2=val2
 --key3=val3
 --random text here not to be matched
"""
new_data = re.findall('\-\-\w+\=[a-zA-Z\s\n]+', s)
for i in new_data:
  print(i)

Output:

--key1=this is a
multiline statement
statement
--key2=val
--key3=val

Upvotes: 2

Related Questions