Reputation: 19203
########################################
# some comment
# other comment
########################################
block1 {
value=data
some_value=some other kind of data
othervalue=032423432
}
block2 {
value=data
some_value=some other kind of data
othervalue=032423432
}
Upvotes: 3
Views: 730
Reputation: 414179
Grako (for grammar compiler) allows to separate the input format specification (grammar) from its interpretation (semantics). Here's grammar for your input format in Grako's variety of EBNF:
(* a file contains zero or more blocks *)
file = {block} $;
(* a named block has at least one assignment statement *)
block = name '{' {assignment}+ '}';
assignment = name '=' value NEWLINE;
name = /[a-z][a-z0-9_]*/;
value = integer | string;
NEWLINE = /\n/;
integer = /[0-9]+/;
(* string value is everything until the next newline *)
string = /[^\n]+/;
To install grako
, run pip install grako
. To generate the PEG parser from the grammar:
$ grako -o config_parser.py Config.ebnf
To convert stdin into json using the generated config_parser
module:
#!/usr/bin/env python
import json
import string
import sys
from config_parser import ConfigParser
class Semantics(object):
def file(self, ast):
# file = {block} $
# all blocks should have unique names within the file
return dict(ast)
def block(self, ast):
# block = name '{' {assignment}+ '}'
# all assignment statements should use unique names
return ast[0], dict(ast[2])
def assignment(self, ast):
# assignment = name '=' value NEWLINE
# value = integer | string
return ast[0], ast[2] # name, value
def integer(self, ast):
return int(ast)
def string(self, ast):
return ast.strip() # remove leading/trailing whitespace
parser = ConfigParser(whitespace='\t\n\v\f\r ', eol_comments_re="#.*?$")
ast = parser.parse(sys.stdin.read(), rule_name='file', semantics=Semantics())
json.dump(ast, sys.stdout, indent=2, sort_keys=True)
{
"block1": {
"othervalue": 32423432,
"some_value": "some other kind of data",
"value": "data"
},
"block2": {
"othervalue": 32423432,
"some_value": "some other kind of data",
"value": "data"
}
}
Upvotes: 1
Reputation: 414179
The best way would be to use an existing format such as JSON.
Here's an example parser for your format:
from lepl import (AnyBut, Digit, Drop, Eos, Integer, Letter,
NON_GREEDY, Regexp, Space, Separator, Word)
# EBNF
# name = ( letter | "_" ) , { letter | "_" | digit } ;
name = Word(Letter() | '_',
Letter() | '_' | Digit())
# words = word , space+ , word , { space+ , word } ;
# two or more space-separated words (non-greedy to allow comment at the end)
words = Word()[2::NON_GREEDY, ~Space()[1:]] > list
# value = integer | word | words ;
value = (Integer() >> int) | Word() | words
# comment = "#" , { all characters - "\n" } , ( "\n" | EOF ) ;
comment = '#' & AnyBut('\n')[:] & ('\n' | Eos())
with Separator(~Regexp(r'\s*')):
# statement = name , "=" , value ;
statement = name & Drop('=') & value > tuple
# suite = "{" , { comment | statement } , "}" ;
suite = Drop('{') & (~comment | statement)[:] & Drop('}') > dict
# block = name , suite ;
block = name & suite > tuple
# config = { comment | block } ;
config = (~comment | block)[:] & Eos() > dict
from pprint import pprint
pprint(config.parse(open('input.cfg').read()))
Output:
[{'block1': {'othervalue': 32423432,
'some_value': ['some', 'other', 'kind', 'of', 'data'],
'value': 'data'},
'block2': {'othervalue': 32423432,
'some_value': ['some', 'other', 'kind', 'of', 'data'],
'value': 'data'}}]
Upvotes: 6
Reputation: 18564
If you do not really mean parsing, but rather text processing and the input data is really that regular, then go with John's solution. If you really need some parsing (like there are some a little more complex rules to the data that you are getting), then depending on the amount of data that you need to parse, I'd go either with pyparsing or simpleparse. I've tried both of them, but actually pyparsing was too slow for me.
Upvotes: 3
Reputation: 42183
Well, the data looks pretty regular. So you could do something like this (untested):
class Block(object):
def __init__(self, name):
self.name = name
infile = open(...) # insert filename here
current = None
blocks = []
for line in infile:
if line.lstrip().startswith('#'):
continue
elif line.rstrip().endswith('{'):
current = Block(line.split()[0])
elif '=' in line:
attr, value = line.strip().split('=')
try:
value = int(value)
except ValueError:
pass
setattr(current, attr, value)
elif line.rstrip().endswith('}'):
blocks.append(current)
The result will be a list of Block instances, where block.name
will be the name ('block1'
, 'block2'
, etc.) and other attributes correspond to the keys in your data. So, blocks[0].value
will be 'data', etc. Note that this only handles strings and integers as values.
(there is an obvious bug here if your keys can ever include 'name'. You might like to change self.name
to self._name
or something if this can happen)
HTH!
Upvotes: 4