Reputation: 8894
This is not a question asking how to use re.findall()
or the global modifier (?g)
or \g
. This is asking how to match n
groups with one regex expression, with n
between 3 and 5.
Rules:
#
(comments) ITEM1
, ITEM2
, ITEM3
class ITEM1(stuff)
model = ITEM2
fields = (ITEM3)
write_once_fields = (ITEM4)
required_fields = (ITEM5)
None
if there is no match, or retrieve pairs.My question is if this is doable, and how?
I've gotten this far, but it hasn't dealt with comments or unknown order or if some items are missing and to stop searching for this particular regex when you see the next class
definition. https://www.regex101.com/r/cG5nV9/8
(?s)\nclass\s(.*?)(?=\()
.*?
model\s=\s(.*?)\n
.*?
(?=fields.*?\((.*?)\))
.*?
(?=write_once_fields.*?\((.*?)\))
.*?
(?=required_fields.*?\((.*?)\))
Do I need a conditional?
Thanks for any kinds of hints.
Upvotes: 3
Views: 818
Reputation: 54213
I'd do something like:
from collections import defaultdict
import re
comment_line = re.compile(r"\s*#")
matches = defaultdict(dict)
with open('path/to/file.txt') as inf:
d = {} # should catch and dispose of any matching lines
# not related to a class
for line in inf:
if comment_line.match(line):
continue # skip this line
if line.startswith('class '):
classname = line.split()[1]
d = matches[classname]
if line.startswith('model'):
d['model'] = line.split('=')[1].strip()
if line.startswith('fields'):
d['fields'] = line.split('=')[1].strip()
if line.startswith('write_once_fields'):
d['write_once_fields'] = line.split('=')[1].strip()
if line.startswith('required_fields'):
d['required_fields'] = line.split('=')[1].strip()
You could probably do this easier with regex matching.
comment_line = re.compile(r"\s*#")
class_line = re.compile(r"class (?P<classname>)")
possible_keys = ["model", "fields", "write_once_fields", "required_fields"]
data_line = re.compile(r"\s*(?P<key>" + "|".join(possible_keys) +
r")\s+=\s+(?P<value>.*)")
with open( ...
d = {} # default catcher as above
for line in ...
if comment_line.match(line):
continue
class_match = class_line.match(line)
if class_match:
d = matches[class_match.group('classname')]
continue # there won't be more than one match per line
data_match = data_line.match(line)
if data_match:
key,value = data_match.group('key'), data_match.group('value')
d[key] = value
But this might be harder to understand. YMMV.
Upvotes: 1