Reputation: 64004
I have lines of data which I want to parse. The data looks like this:
a score=216 expect=1.05e-06
a score=180 expect=0.0394
What I want to do is to have a subroutine that parse them and return 2 values (score and expect) for each line.
However this function of mine doesn't seem to work:
def scoreEvalFromMaf(mafLines):
for word in mafLines[0]:
if word.startswith("score="):
theScore = word.split('=')[1]
theEval = word.split('=')[2]
return [theScore, theEval]
raise Exception("encountered an alignment without a score")
Please advice what's the right way to do it?
Upvotes: 0
Views: 2011
Reputation: 881635
If mafLines
if a list of lines, and you want to look just at the first one, .split
that line to obtain the words. For example:
def scoreEvalFromMaf(mafLines):
theScore = None
theEval = None
for word in mafLines[0].split:
if word.startswith('score='):
_, theScore = word.partition('=')
elif word.startswith('expect='):
_, theEval = word.partition('=')
if theScore is None:
raise Exception("encountered an alignment without a score")
if theEVal is None:
raise Exception("encountered an alignment without an eval")
return theScore, theEval
Note that this will return a tuple with two string items; if you want an int and a float, for example, you need to change the last line to
return int(theScore), float(theEval)
and then you'll get a ValueError exception if either string is invalid for the type it's supposed to represent, and the returned tuple with two numbers if both strings are valid.
Upvotes: 2
Reputation: 90493
Obligatory and possibly inappropriate regexp solution:
import re
def scoreEvalFromMaf(mafLines):
return [re.search(r'score=(.+) expect=(.+)', line).groups()
for line in mafLines]
Upvotes: 1
Reputation: 3455
It looks like you want to split each line up by spaces, and parse each chunk separately. If mafLines is a string (ie. one line from .readlines()
:
def scoreEvalFromMafLine(mafLine):
theScore, theEval = None, None
for word in mafLine.split():
if word.startswith("score="):
theScore = word.split('=')[1]
if word.startswith("expect="):
theEval = word.split('=')[1]
if theScore is None or theEval is None:
raise Exception("Invalid line: '%s'" % line)
return (theScore, theEval)
The way you were doing it would iterate over each character in the first line (since it's a list of strings) rather than on each space.
Upvotes: 2