Stefan S.
Stefan S.

Reputation: 23

Identifying Lists of Numbers With Regular Expressions in Python

I'm working with data from an online math tutor program, and I'd like to be able to identify some features of those problems. For example, for the following question:

Find the median of the 7 numbers in the following list:
[22, 13, 5, 16, 4, 12, 30]

I'd like to know if

1. the problem includes a list, 
2. how long the longest list in the problem is, and 
3. how many numbers are in the problem total. 

So for the problem above, it has a list, the list is 7 numbers long, and there are 8 numbers in the problem total.

I've written the following regex script that can identify positive and negative numbers and floats, but I can't figure out how to identify a series of numbers that are in a list:

'[-+]{0,1}[0-9]+\.{0,1}(?! )[0-9]+'

Additionally, the data is poorly formatted, all of the following examples are possible for what a list of numbers can look like:

[1, 2, 3]
1, 2, 3
1,2,3.
1,    2,    3,    4,    5

I've been working on this for a few days now, and have stopped being able to make any progress on it. Can anyone help? It might not even be a problem to solve with a regex, I'm just not sure how to go about it from this point.

Upvotes: 2

Views: 133

Answers (2)

tswei
tswei

Reputation: 453

In addition to the answer provided by alfasin, you can do a second search to find sub-strings encased in a list:

s = '''1, 2, 3
       [4, 5, 6]
       3, 2, 1. '''

l = re.findall(r'\[.*\]', s)
# number of lists in string
print len(l)

# largest array by length of numbers in each list found
print max([re.findall(r'\d+', i) for i in l])

# number of numbers total in problem
print re.findall(r'\d+', s)

Upvotes: 0

Nir Alfasi
Nir Alfasi

Reputation: 53535

Assuming you get the input as a string - you can use re.findall to extract only the numbers out of it:

import re

s = """[1, -2, 3]
        1, 2, 3
        1,2,3.
        1,    2,    3,    4,    5"""

res = re.findall(r'-?\d+', s)
print res # ['1', '-2', '3', '1', '2', '3', '1', '2', '3', '1', '2', '3', '4', '5']

# and if you want to turn the strings into numbers:
print map(int, res)  # [1, -2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 4, 5]

Upvotes: 2

Related Questions