Reputation: 64135

Parsing a generic data file in python

I have a data file that is pretty simple in the format of title : data. For example (real file is the same but with more data):

Images : 50
Total cells : 532
Viable cells : 512
Nonviable cells : 20

Right now to parse this I have the following code for every data piece I want:

if data[1][:12] == "Total Cells :":
    result.append(data[1][13:-1])

This feels like a really dirty solution though. What would be a more clean way to solve this problem?

Upvotes: 0

Answers (3)

dnozay

Reputation: 24324

You can use str.split() but then you could just use str.partition(), here are the help texts:

For partition:

partition(...)
    S.partition(sep) -> (head, sep, tail)

    Search for the separator sep in S, and return the part before it,
    the separator itself, and the part after it.  If the separator is not
    found, return S and two empty strings.

For split:

split(...)
    S.split([sep [,maxsplit]]) -> list of strings

    Return a list of the words in the string S, using sep as the
    delimiter string.  If maxsplit is given, at most maxsplit
    splits are done. If sep is not specified or is None, any
    whitespace string is a separator and empty strings are removed
    from the result.

I would recommend going with the easy interface:

>>> line = "Images : 50"
>>> key, sep, value = line.partition(" : ")
>>> key, sep, value
('Images', ' : ', '50')

you could use something along the lines:

result = {}
for line in data:
    # this assumes : is always surrounded by spaces.
    key, sep, value = line.partition(" : ")
    # seems like value is a number...
    result[key] = int(value)

Upvotes: 0

Martin Konecny

Reputation: 59611

If you want this data file in a nice dictionary, you can do the following:

d = {}
for line in data:
    key, value = line.split(':')
    d[key] = value

printing out d will return:

{'Images': 50, 'Total cells': 532, 'Viable cells': 512, 'Nonviable cells': 20}

This assumes none of your "keys" or "values" have : in them.

You can then access any of the elements (i.e. "Total Cells") like so:

print d['Total cells']

Upvotes: 3

Martijn Pieters

Reputation: 1122172

You can simply split the line on ' : ':

key, value = data[1].split(' : ', 1)

Now you have the two elements of the line separated into two variables. You may want to strip these of extraneous whitespace:

key, value = map(str.strip, data[1].split(':', 1))

Demo:

>>> map(str.strip, 'Images : 50'.split(':'))
['Images', '50']
>>> map(str.strip, 'Total cells : 532'.split(':'))
['Total cells', '532']

Upvotes: 3

Parsing a generic data file in python

Answers (3)

Related Questions