neversaint
neversaint

Reputation: 64004

Parsing line with delimiter in Python

I have lines of data which I want to parse. The data looks like this:

a score=216 expect=1.05e-06
a score=180 expect=0.0394

What I want to do is to have a subroutine that parse them and return 2 values (score and expect) for each line.

However this function of mine doesn't seem to work:

def scoreEvalFromMaf(mafLines):
    for word in mafLines[0]:
        if word.startswith("score="):
            theScore = word.split('=')[1]
            theEval  = word.split('=')[2]
            return [theScore, theEval]
    raise Exception("encountered an alignment without a score")

Please advice what's the right way to do it?

Upvotes: 0

Views: 2011

Answers (3)

Alex Martelli
Alex Martelli

Reputation: 881635

If mafLines if a list of lines, and you want to look just at the first one, .split that line to obtain the words. For example:

def scoreEvalFromMaf(mafLines):
    theScore = None
    theEval = None
    for word in mafLines[0].split:
        if word.startswith('score='):
            _, theScore = word.partition('=')
        elif word.startswith('expect='):
            _, theEval = word.partition('=')
    if theScore is None:
        raise Exception("encountered an alignment without a score")
    if theEVal is None:
        raise Exception("encountered an alignment without an eval")
    return theScore, theEval

Note that this will return a tuple with two string items; if you want an int and a float, for example, you need to change the last line to

    return int(theScore), float(theEval)

and then you'll get a ValueError exception if either string is invalid for the type it's supposed to represent, and the returned tuple with two numbers if both strings are valid.

Upvotes: 2

harto
harto

Reputation: 90493

Obligatory and possibly inappropriate regexp solution:

import re
def scoreEvalFromMaf(mafLines):
    return [re.search(r'score=(.+) expect=(.+)', line).groups()
            for line in mafLines]

Upvotes: 1

Anthony Briggs
Anthony Briggs

Reputation: 3455

It looks like you want to split each line up by spaces, and parse each chunk separately. If mafLines is a string (ie. one line from .readlines():

def scoreEvalFromMafLine(mafLine):
    theScore, theEval = None, None
    for word in mafLine.split():
        if word.startswith("score="):
            theScore = word.split('=')[1]
        if word.startswith("expect="):
            theEval  = word.split('=')[1]

    if theScore is None or theEval is None:
        raise Exception("Invalid line: '%s'" % line)

    return (theScore, theEval)

The way you were doing it would iterate over each character in the first line (since it's a list of strings) rather than on each space.

Upvotes: 2

Related Questions