Reputation: 895
Is it possible to perform simple math on the output from Python regular expressions?
I have a large file where I need to divide numbers following a ")"
by 100. For instance, I would convert the following line containing )75
and )2
:
((words:0.23)75:0.55(morewords:0.1)2:0.55);
to )0.75
and )0.02
:
((words:0.23)0.75:0.55(morewords:0.1)0.02:0.55);
My first thought was to use re.sub
using the search expression "\)\d+"
, but I don't know how to divide the integer following the parenthesis by 100, or if this is even possible using re
.
Any thoughts on how to solve this? Thanks for your help!
Upvotes: 7
Views: 1599
Reputation: 78610
You can do it by providing a function as the replacement:
s = "((words:0.23)75:0.55(morewords:0.1)2:0.55);"
s = re.sub("\)(\d+)", lambda m: ")" + str(float(m.groups()[0]) / 100), s)
print s
# ((words:0.23)0.75:0.55(morewords:0.1)0.02:0.55);
Incidentally, if you wanted to do it using BioPython's Newick tree parser instead, it would look like this:
from Bio import Phylo
# assuming you want to read from a string rather than a file
from StringIO import StringIO
tree = Phylo.read(StringIO(s), "newick")
for c in tree.get_nonterminals():
if c.confidence != None:
c.confidence = c.confidence / 100
print tree.format("newick")
(while this particular operation takes more lines than the regex version, other operations involving trees might be made much easier with it).
Upvotes: 15
Reputation: 251428
The replacement expression for re.sub
can be a function. Write a function that takes the matched text, converts it to a number, divides it by 100, and then returns the string form of the result.
Upvotes: 1