user741592
user741592

Reputation: 925

Parsing specific contents in a file

I have a file that looks like this

    !--------------------------------------------------------------------------DISK
    [DISK]
    DIRECTION  =  'OK'
    TYPE       =  'normal'

    !------------------------------------------------------------------------CAPACITY
    [CAPACITY]
    code            =    0
    ID          =   110

I want to read sections [DISK] and [CAPACITY].. there will be more sections like these. I want to read the parameters defined under those sections.

I wrote a following code:

file_open = open(myFile,"r")
all_lines = file_open.readlines()
count = len(all_lines)
file_open.close()
my_data = {}
section = None
data = ""
for line in all_lines:
  line = line.strip()                               #remove whitespace
  line = line.replace(" ", "")      
  if len(line) != 0:               # remove white spaces between data        
      if line[0] == "[":
          section = line.strip()[1:]
          data = ""
      if line[0] !="[":
          data += line + "," 
          my_data[section] = [bit for bit in data.split(",") if bit != ""]
print my_data
key = my_data.keys()
print key   

Unfortunately I am unable to get those sections and the data under that. Any ideas on this would be helpful.

Upvotes: 1

Views: 149

Answers (3)

Demian Brecht
Demian Brecht

Reputation: 21368

Are you able to make a small change to the text file? If you can make it look like this (only changed the comment character):

#--------------------------------------------------------------------------DISK
[DISK]
DIRECTION  =  'OK'
TYPE       =  'normal'

#------------------------------------------------------------------------CAPACITY
[CAPACITY]
code            =    0
ID          =   110

Then parsing it is trivial:

from ConfigParser import SafeConfigParser

parser = SafeConfigParser()
parser.read('filename')

And getting data looks like this:

(Pdb) parser
<ConfigParser.SafeConfigParser instance at 0x100468dd0>
(Pdb) parser.get('DISK', 'DIRECTION')
"'OK'"

Edit based on comments:

If you're using <= 2.7, then you're a little SOL.. The only way really would be to subclass ConfigParser and implement a custom _read method. Really, you'd just have to copy/paste everything in Lib/ConfigParser.py and edit the values in line 477 (2.7.3):

if line.strip() == '' or line[0] in '#;': # add new comment characters in the string

However, if you're running 3'ish (not sure what version it was introduced in offhand, I'm running 3.4(dev)), you may be in luck: ConfigParser added the comment_prefixes __init__ param to allow you to customize your prefix:

parser = ConfigParser(comment_prefixes=('#', ';', '!'))

Upvotes: 1

sloth
sloth

Reputation: 101042

As others already pointed out, you should be able to use the ConfigParser module.


Nonetheless, if you want to implement the reading/parsing yourself, you should split it up into two parts.

Part 1 would be the parsing at file level: splitting the file up into blocks (in your example you have two blocks: DISK and CAPACITY).

Part 2 would be parsing the blocks itself to get the values.

You know you can ignore the lines starting with !, so let's skip those:

with open('myfile.txt', 'r') as f:
    content = [l for l in f.readlines() if not l.startswith('!')]

Next, read the lines into blocks:

def partition_by(l, f):
    t = []
    for e in l:
        if f(e):
            if t: yield t
            t = []
        t.append(e)
    yield t

blocks = partition_by(content, lambda l: l.startswith('['))

and finally read in the values for each block:

def parse_block(block):
    gen = iter(block)
    block_name = next(gen).strip()[1:-1]
    splitted = [e.split('=') for e in gen]
    values = {t[0].strip(): t[1].strip() for t in splitted if len(t) == 2}
    return block_name, values

result = [parse_block(b) for b in blocks]

That's it. Let's have a look at the result:

for section, values in result:
    print section, ':'
    for k, v in values.items():
        print '\t', k, '=', v

output:

DISK :
        DIRECTION = 'OK'
        TYPE = 'normal'
CAPACITY :
        code = 0
        ID = 110

Upvotes: 1

Asterisk
Asterisk

Reputation: 3574

If the file is not big, you can load it and use Regexes to find parts that are of interest to you.

Upvotes: -1

Related Questions