terminix00
terminix00

Reputation: 337

Python regex split by number on line by itself

I'd like to split this text by numbers that standalone in a line.

1
root -0.307087 17.6356 -28.2214 2.36076 1.44212 -4.54601
lowerback 15.4094 -0.182495 1.65268
upperback 1.54579 0.0318172 -0.110122
thorax -6.9977 -0.0335751 -1.06068
lowerneck -3.24163 -0.676991 -1.34632
upperneck -9.28199 -0.818331 1.08102
head -2.3551 -0.388697 0.578143
rclavicle 1.74931e-014 -4.77083e-015
rhumerus -42.2757 19.3184 -90.6312
rradius 79.2191
rwrist 2.46902
rhand -35.8906 32.487
rfingers 7.12502
rthumb -9.00425 2.69918
lclavicle 1.74931e-014 -4.77083e-015
lhumerus -46.581 -10.5126 91.072
lradius 108.082
lwrist 30.7395
lhand -39.5085 13.512
lfingers 7.12502
lthumb -12.4939 43.1185
rfemur 4.30283 -1.72433 25.7796
rtibia 82.7602
rfoot 27.83 -8.73877
rtoes 20.2614
lfemur -27.49 -2.09007 -20.1015
ltibia 38.398
lfoot -7.19848 -5.78026
ltoes 5.97973
2
root -0.303728 17.5624 -27.7253 2.02549 1.77071 -4.33872
lowerback 16.0608 -0.380636 1.35189
upperback 1.68665 -0.267024 -0.0539964
thorax -7.21419 -0.169571 -0.765959
lowerneck -2.88855 -0.493739 -1.55908
upperneck -9.88628 -0.567977 1.15901
head -2.623 -0.258251 0.642519
rclavicle -7.65321e-015 -2.38542e-015
rhumerus -42.619 18.2084 -90.2387
rradius 76.8375
rwrist 5.33346
rhand -37.643 32.4997
rfingers 7.12502
rthumb -10.695 2.7919
lclavicle -7.65321e-015 -2.38542e-015
lhumerus -43.8177 -11.0502 91.3641
lradius 108.431
lwrist 30.2025
lhand -38.9758 12.3082
lfingers 7.12502
lthumb -11.9803 41.9454
rfemur 1.76685 -3.0026 24.5235
rtibia 87.0878
rfoot 27.0955 -9.32294
rtoes 22.2194
lfemur -26.5572 -2.78834 -20.4876
ltibia 40.7855
lfoot -10.1476 -3.85256
ltoes 0.48001
3
root -0.294208 17.4728 -27.2384 1.62853 1.94279 -4.06517
lowerback 16.9292 -0.51999 1.14183
upperback 1.81465 -0.483798 -0.143209
thorax -7.55951 -0.270454 -0.690263
lowerneck -2.59928 -0.313935 -1.56078
upperneck -10.5834 -0.320817 1.24057
head -2.91503 -0.136576 0.671345
rclavicle -1.54058e-014 -3.97569e-015
rhumerus -42.9367 16.607 -89.7942
rradius 74.9122
rwrist 7.29535
rhand -38.4744 33.0964
rfingers 7.12502
rthumb -11.4968 3.43167
lclavicle -1.54058e-014 -3.97569e-015
lhumerus -40.8446 -11.9999 91.445
lradius 108.671
lwrist 29.7854
lhand -38.5919 11.658
lfingers 7.12502
lthumb -11.6101 41.3163
rfemur -0.94671 -4.033 23.2605
rtibia 91.2781
rfoot 26.5333 -9.15277
rtoes 23.1538
lfemur -25.0499 -3.27418 -20.9658
ltibia 42.1017
lfoot -12.067 -2.99804
ltoes -2.17676

Ideally, I'd like to get the content in between the standalone numbers excluding the numbers. I've tried this rule:

r"[0-9]+(?<=)[\r\n]"

where I would like to find the numbers that have nothing preceding it followed by a new line.

What would be the correct rule to do this?

Upvotes: 2

Views: 49

Answers (1)

Jean-Fran&#231;ois Fabre
Jean-Fran&#231;ois Fabre

Reputation: 140188

Your regex attempt cannot work, for multiple reasons, for instance, it consumes the digits of the decimal numbers since it doesn't start by linefeed. Also the lookahead makes no sense (seems empty) and you don't need it.

I would split with "numbers" regex, included between 2 newlines (with optional carriage return chars before the newline just in case)

test:

import re

text = """rfoot 27.0955 -9.32294
lfoot -10.1476 -3.85256
ltoes 0.48001
3
root -0.294208 17.4728 -27.2384 1.62853 1.94279 -4.06517
rwrist 7.29535
5
rhand -38.4744 33.0964
lradius 108.671
lwrist 29.7854"""


print(re.split(r"\r?\n\d+\r?\n",text))

result: ['rfoot 27.0955 -9.32294\nlfoot -10.1476 -3.85256\nltoes 0.48001', 'root -0.294208 17.4728 -27.2384 1.62853 1.94279 -4.06517\nrwrist 7.29535', 'rhand -38.4744 33.0964\nlradius 108.671\nlwrist 29.7854']

Note that this simplistic approach doesn't handle the cases where the text starts or ends with a digit alone on a line. We have to complexify it a little bit by adding ^| and |$ cases, but in that case, we get single linefeeds left out and also empty fields. So we can apply a corrective list comprehension to filter out "blank" fields (maybe it can be done with pure regexes, though):

text = """1
rfoot 27.0955 -9.32294
lfoot -10.1476 -3.85256
ltoes 0.48001
3
root -0.294208 17.4728 -27.2384 1.62853 1.94279 -4.06517
rwrist 7.29535
5
rhand -38.4744 33.0964
lradius 108.671
lwrist 29.7854
4"""


print([x for x in re.split(r"(^|\r?\n)\d+(\r?\n|$)",text) if x.strip()])

result:

['rfoot 27.0955 -9.32294\nlfoot -10.1476 -3.85256\nltoes 0.48001', 'root -0.294208 17.4728 -27.2384 1.62853 1.94279 -4.06517\nrwrist 7.29535', 'rhand -38.4744 33.0964\nlradius 108.671\nlwrist 29.7854']

Upvotes: 2

Related Questions