Reputation: 406
I have a file with the structure:
N1H3O1 C2H2
C1H4 H201
C1H1N1 N1H3
C2N1O1P1H3 P5
What I am trying to do is to count the sum of coefficients in each of the formulae. Thus, the desire output is:
1+3+1 5 2+2 4
1+4 5 2+1 3
1+1+1 3 3+1 4
2+1+1+1+3 8 5 5
What I did is a simple replacement of each letter with "+" and then deleting the first " +".
I however would like to know how to do it in a more proper way in sed, using branch and flow operators.
Upvotes: 0
Views: 68
Reputation: 27743
The problem with your input is the 0
which is used instead of O
, which might make it difficult to design a regular expression for it, which you can see here:
([^A-Z]+)*([0-9]+)
Other than that, you might be able to capture the numbers by simply adding ([^A-Z]+)
.
However, you may not wish to do this task with regular expression, since your data except for that 0
is pretty structured, and you could maybe write a script to do so.
Upvotes: 1