Reputation:
I am trying to find all numbers in text and return them in a list of floats.
In the text:
My code seems to extract numbers separated with a comma and space and numbers attached to words. However, it extracts numbers separated by commas as separate numbers
text = "30feet is about 10metre but that's 1 rough estimate several numbers are like 2, 137, and 40 or something big numbers are like 2,137,040 or something"
list(map(int, re.findall('\d+', text)))
The suggestions below work beautifully
Unfortunately, the output of the below returns a string:
nums = re.findall(r'\b\d{1,3}(?:,\d{3})*(?:\.\d+)?(?!\d)', text)
print(nums)
I need to return the output as a list of floats, with commas between but no speech marks.
Eg.
extract_numbers("1, 2, 3, un pasito pa'lante Maria")
is [1.0, 2.0, 3.0]
Unfortunately, I have not yet been successful in my attempts. Currently, my code reads
def extract_numbers(text):
nums = re.findall(r'\b\d{1,3}(?:,\d{3})*(?:\.\d+)?(?!\d)', text)
return (("[{0}]".format(
', '.join(map(str, nums)))))
extract_numbers(TEXT_SAMPLE)
Upvotes: 1
Views: 3406
Reputation: 522824
You may try doing a regex re.findall
search on the following pattern:
\b\d{1,3}(?:,\d{3})*(?:\.\d+)?(?!\d)
Sample script - try it here
import re
text = "30feet is about 10metre but that's 1 rough estimate several numbers are like 2, 137, and 40 or something big numbers are like 2,137,040 or something"
nums = re.findall(r'\b\d{1,3}(?:,\d{3})*(?:\.\d+)?(?!\d)', text)
print(nums)
This prints:
['30', '10', '1', '2', '137', '40', '2,137,040']
Here is an explanation of the regex pattern:
\b word boundary
\d{1,3} match 1 to 3 leading digits
(?:,\d{3})* followed by zero or more thousands terms
(?:\.\d+)? match an optional decimal component
(?!\d) assert the "end" of the number by checking for a following non digit
Upvotes: 5
Reputation: 5183
Create a pattern with an optional character group []
Code try it here
import re
text = "30feet is about 10metre but that's 1 rough estimate several numbers are like 2, 137, and 40 or something big numbers are like 2,137,040 or something"
out = [
int(match.replace(',', ''))
for match in re.findall('[\d,]+', text)
]
print(out)
Output
[30, 10, 1, 2, 137, 40, 2137040]
Upvotes: 2
Reputation: 4802
you need to match the commas as well, then strip them before turning them into an integer:
list(map(lambda n: int(n.replace(',','')), re.findall('[\d,]+', text)))
Also, you should probably be using list comprehensions unless you need python2 compatibility for some reason:
[int(n.replace(',', '')) for n in re.findall('[\d,]+', text)]
Upvotes: 0