Reputation: 1529
I have a string mixed with numbers and words. I want to be able to extract the numeric values from the string as tokens.
For example,
input
str = "Score 1 and 2 sometimes, often 1 and 1/2, or 2.5 or 3 and 1/3." should ideally
output,
Score -> word
1 -> number
and -> word
2 -> number
...
1 and 1/2 -> number (this group should stay together as number)
or -> word
2.5 -> number
...
3 and 1/3 -> number
I could solve the problem partly by using regex as follows,
rule 1:
re.findall(r'\s*(\d*\.?\d+)\s*', str1) and
rule 2:
re.findall(r'(?:\s*\d* and \d+\/\d+\s*)', str1)
It partly works. I could not put these together to solve the problem. I tried this,
re.findall(r'(?:\s*(\d*\.?\d+)\s*)|(?:\s*\d* and \d+\/\d+\s*)', str1)
Can anyone please help and show how I could put the rules together and get the result?
Upvotes: 1
Views: 91
Reputation: 627607
You can use
import re
text = "Score 1 and 2 sometimes, often 1 and 1/2, or 2.5 or 3 and 1/3."
matches = re.findall(r'((\d*\.?\d+(?:\/\d*\.?\d+)?)(?:\s+and\s+(\d*\.?\d+(?:\/\d*\.?\d+)?))?)', text)
result = []
for x,y,z in matches:
if '/' in x:
result.append(x)
else:
result.extend(filter(lambda x: x!="", [y,z]))
print( result )
# => ['1', '2', '1 and 1/2', '2.5', '3 and 1/3']
See the Python demo. Here is the regex demo.
Details:
/
char into the result
, or the two other captures as separate items otherwise.The regex par matches
(
- outer capturing group start (Group 1):(\d*\.?\d+(?:\/\d*\.?\d+)?)
- Group 2: a number/fraction pattern: zero or more digits, an optional .
, one or more digits and then an optional occurrence of a /
char and then zero or more digits, an optional .
, one or more digits(?:\s+and\s+(\d*\.?\d+(?:\/\d*\.?\d+)?))?
- an optional occurrence of
\s+and\s+
- and
word with one or more whitespaces around it(\d*\.?\d+(?:\/\d*\.?\d+)?)
- Group 3: number/fraction pattern)
- outer capturing group end.Upvotes: 1