Reputation: 2022
In python, I would like to extract a substring from a line but preserve the whitespaces that occur in the substring. For example, in the following:
34 -1 1 10 C2H4 + OH = C2H3 + H2O 8.020E+13 0.00 5955.0
35 -0.301029996 0.301029996 2 C2H3 + O2 = CH2O + HCO 4.000E+12 0.00 -250.0
36 -0.477121255 0.477121255 3 C2H3 + HCO = C2H4 + CO 6.034E+13 0.00 0.0
37 -1 1 10 C3H5 = C2H2+CH3 2.397E+48 -9.90 8.208E+04
38 -1 1 10 C2H4(+M) = C2H2+H2(+M) 1.800E+13 0.00 7.600E+04
39 -1 1 10 C2H3+O2 = C2H2+HO2 2.120E-06 6.00 9.484E+03
40 -0.505149978 0.505149978 3.2 C2H3+H = C2H2+H2 2.000E+13 0.00 2.500E+03
41 -0.505149978 0.505149978 3.2 C2H2+H(+M) = C2H3(+M) 3.110E+11 0.58 2.589E+03
42 -1 1 10 C2H2+O2 = HCCO+OH 2.000E+08 1.50 3.010E+04
43 -0.698970004 0.698970004 5 C2H2+O = HCCO+H 1.430E+07 2.00 1.900E+03
44 -1 1 10 C2H2+OH = CH2CO+H 2.190E-04 4.50 -1.000E+03
45 -0.477121255 0.477121255 3 CH2CO+H = CH3+CO 1.100E+13 0.00 3.400E+03
I would like to extract the substring starting from the fifth element until the third from the end of the line yielding the chemical reaction with whitespaces intact like so:
C2H4 + OH = C2H3 + H2O
I tried with split
like so but I lose whitespaces:
chemical_reaction=' '.join(aline.split()[4:-3])
I get:
C2H4 + OH = C2H3 + H2O
Upvotes: 2
Views: 147
Reputation: 785068
You can use this regex:
^\s*(?:\S+\s+){4}(.+?)(?:\s+\S+){3}\s*$
and grab captured group #1 which is being returned by middle path (.+?)
. On either side of this group we are matching 4 non-space words and 3 non-space words.
Upvotes: 4