Reputation: 59
I have a string like this:
"8 R-301 288/313 0.24 TT-2079 347.48
"
Now I want to extract 288/313 and 0.24 using regex So I wrote this:
r42=re.findall(r"8 +R-301.*",pdf[i])
if len(r42[0].split())>3:
print(r42[0].split())
logbook['R-301 Temp']=r42[0].split()[2]
logbook['R-301 P']=r42[0].split()[3]
So in an ideal case number comes in 2nd and 3rd index and I can get it.
But the problem I'm facing is that sometimes 288/313 has spaces like this 288 / 313, also the second number 0.24 can have a slash too so it can be like 0.24/0.25 or 0.24 /0.25. So the above regex doesn't work.
What would be an ideal regex for these decimal slash numbers with random spaces?
Note: The string can have multiple spaces between characters.
Edit:
Sorry I forgot one detail here:
numbers can have - like in the above example it can be -/313 instead of 288/313 or - instead of 0.24 or 0.24/- or -/0.24.
Something like this:
"8 R-301 288/- - TT-2079 347.48
"
I want to target those cases as well.
Upvotes: 3
Views: 209
Reputation: 133508
With your shown samples, could you please try following regex.
Let's say following are the values:
var="""8 R-301 288/313 0.24/0.25 TT-2079 347.48
8 R-301 288 / 313 0.24/0.25 TT-2079 347.48
8 R-301 288 / 313 0.24 / 0.25 TT-2079 347.48
8 R-301 - / 313 -/ 0.25 TT-2079 347.48
8 R-301 288 /313 -/ 0.25 TT-2079 347.48
8 R-301 313 / - -/ 0.25 TT-2079 347.48
8 R-301 288 / 313 0.24/ 0.25 TT-2079 347.48
8 R-301 123/313 -/12123 TT-2079 347.48
8 R-301 123.12/31.23 -/12123 TT-2079 347.48
8 R-301 123/313 -/- TT-2079 347.48
8 R-301 123 12123 TT-2079 347.48
8 R-301 -/123 -/- TT-2079 347.48"""
Now following is the code:
import re
val = re.findall(r'^\d+\s+R-\d+\s+(.*?)T.*',var,re.M)
for i in val:
re.findall(r'((?:\d+(?:\.\d+)?|-)(?:(?:\s+)?\/(?:\s+)?(?:\d+(?:\.\d+)?|-))?)',i,re.M)
Output will be as follows:
['288/313', '0.24/0.25']
['288 / 313', '0.24/0.25']
['288 / 313', '0.24 / 0.25']
['- / 313', '-/ 0.25']
['288 /313', '-/ 0.25']
['313 / -', '-/ 0.25']
['288 / 313', '0.24/ 0.25']
['123/313', '-/12123']
['123.12/31.23', '-/12123']
['123/313', '-/-']
['123', '12123']
['-/123', '-/-']
Explanation: Adding detailed explanation for above regex:
( ##Creating 1st capturing group here.
(?:\d+(?:\.\d+)?|-) ##In a non-capturing group matching digits with optional dot digits OR followed by - here.
(?:(?:\s+)?\/ ##In a non-capturing group with optional spaces with / here.
(?:\s+)?(?:\d+(?:\.\d+)?|-))? ##Matching optional spaces with digits and optional digits OR with optional - here.
) ##Closing 1st capturing group here.
Upvotes: 4
Reputation: 195438
Try:
import re
tests = [
"8 R-301 288/313 0.24 TT-2079 347.48",
"8 R-301 288 / 313 0.24 TT-2079 347.48",
"8 R-301 288/313 0.24/0.25 TT-2079 347.48",
"8 R-301 288/313 0.24/ 0.25 TT-2079 347.48",
"8 R-301 288 / 313 0.24 / 0.25 TT-2079 347.48",
"8 R-301 288/- - TT-2079 347.48",
"8 R-301 288/- 0.1 /- TT-2079 347.48",
"8 R-301 -/233 - TT-2079 347.48",
"8 R-301 313 -/12123 TT-2079 347.48",
]
r = re.compile(
r"\s+((?:[\d\.-]+\s*/\s*[\d\.-]+)|[\d\.-]+)\s+([\d\.-]+(?:\s*/\s*[\d\.-]+)?)"
)
for test in tests:
m = r.search(test)
if m:
m = m.groups()
number1 = m[0].replace(" ", "")
number2 = m[1].replace(" ", "")
print(number1, number2)
Prints:
288/313 0.24
288/313 0.24
288/313 0.24/0.25
288/313 0.24/0.25
288/313 0.24/0.25
288/- -
288/- 0.1/-
-/233 -
313 -/12123
EDIT: Updated regex to accept -
EDIT2: Updated regex to accept first value without slash /
Upvotes: 3
Reputation: 43169
I think it is far easier to split on more than two spaces and use the corresponding parts afterwards:
import re
tests = [
"8 R-301 288/313 0.24 TT-2079 347.48",
"8 R-301 288 / 313 0.24 TT-2079 347.48",
"8 R-301 288/313 0.24/0.25 TT-2079 347.48",
"8 R-301 288/313 0.24/ 0.25 TT-2079 347.48",
"8 R-301 288 / 313 0.24 / 0.25 TT-2079 347.48",
"8 R-301 288/- - TT-2079 347.48",
"8 R-301 288/- 0.1 /- TT-2079 347.48",
"8 R-301 -/233 - TT-2079 347.48",
]
rx = re.compile(r'\s{2,}')
for line in tests:
parts = rx.split(line)
print(parts[2])
print(parts[3])
Which yields
288/313
0.24
288 / 313
0.24
288/313
0.24/0.25
288/313
0.24/ 0.25
288 / 313
0.24 / 0.25
288/-
-
288/-
0.1 /-
-/233
-
Upvotes: 2