user14152659
user14152659

Reputation:

How can I extract numbers containing commas from strings in python

I am trying to find all numbers in text and return them in a list of floats.

In the text:

My code seems to extract numbers separated with a comma and space and numbers attached to words. However, it extracts numbers separated by commas as separate numbers

text = "30feet is about 10metre but that's 1 rough estimate several numbers are like 2, 137, and 40 or something big numbers are like 2,137,040 or something"

list(map(int, re.findall('\d+', text)))

The suggestions below work beautifully

Unfortunately, the output of the below returns a string:

nums = re.findall(r'\b\d{1,3}(?:,\d{3})*(?:\.\d+)?(?!\d)', text)
print(nums)

I need to return the output as a list of floats, with commas between but no speech marks.

Eg. 
extract_numbers("1, 2, 3, un pasito pa'lante Maria")
    is [1.0, 2.0, 3.0]

Unfortunately, I have not yet been successful in my attempts. Currently, my code reads

def extract_numbers(text):
  nums = re.findall(r'\b\d{1,3}(?:,\d{3})*(?:\.\d+)?(?!\d)', text)
  
    return (("[{0}]".format( 
                       ', '.join(map(str, nums))))) 

extract_numbers(TEXT_SAMPLE)

Upvotes: 1

Views: 3406

Answers (4)

Tim Biegeleisen
Tim Biegeleisen

Reputation: 522824

You may try doing a regex re.findall search on the following pattern:

\b\d{1,3}(?:,\d{3})*(?:\.\d+)?(?!\d)

Sample script - try it here

import re

text = "30feet is about 10metre but that's 1 rough estimate several numbers are like 2, 137, and 40 or something big numbers are like 2,137,040 or something"

nums = re.findall(r'\b\d{1,3}(?:,\d{3})*(?:\.\d+)?(?!\d)', text)
print(nums)

This prints:

['30', '10', '1', '2', '137', '40', '2,137,040']

Here is an explanation of the regex pattern:

\b            word boundary
\d{1,3}       match 1 to 3 leading digits
(?:,\d{3})*   followed by zero or more thousands terms
(?:\.\d+)?    match an optional decimal component
(?!\d)        assert the "end" of the number by checking for a following non digit

Upvotes: 5

RichieV
RichieV

Reputation: 5183

Create a pattern with an optional character group []

Code try it here

import re

text = "30feet is about 10metre but that's 1 rough estimate several numbers are like 2, 137, and 40 or something big numbers are like 2,137,040 or something"

out = [
    int(match.replace(',', ''))
    for match in re.findall('[\d,]+', text)
]
print(out)

Output

[30, 10, 1, 2, 137, 40, 2137040]

Upvotes: 2

Drew Shafer
Drew Shafer

Reputation: 4802

you need to match the commas as well, then strip them before turning them into an integer:

list(map(lambda n: int(n.replace(',','')), re.findall('[\d,]+', text)))

Also, you should probably be using list comprehensions unless you need python2 compatibility for some reason:

[int(n.replace(',', '')) for n in re.findall('[\d,]+', text)]

Upvotes: 0

Elvin Aghammadzada
Elvin Aghammadzada

Reputation: 881

y not use? array = re.findall(r'[0-9]+', str)

Upvotes: -1

Related Questions