Reputation: 23
I'm working with data from an online math tutor program, and I'd like to be able to identify some features of those problems. For example, for the following question:
Find the median of the 7 numbers in the following list:
[22, 13, 5, 16, 4, 12, 30]
I'd like to know if
1. the problem includes a list,
2. how long the longest list in the problem is, and
3. how many numbers are in the problem total.
So for the problem above, it has a list, the list is 7 numbers long, and there are 8 numbers in the problem total.
I've written the following regex script that can identify positive and negative numbers and floats, but I can't figure out how to identify a series of numbers that are in a list:
'[-+]{0,1}[0-9]+\.{0,1}(?! )[0-9]+'
Additionally, the data is poorly formatted, all of the following examples are possible for what a list of numbers can look like:
[1, 2, 3]
1, 2, 3
1,2,3.
1, 2, 3, 4, 5
I've been working on this for a few days now, and have stopped being able to make any progress on it. Can anyone help? It might not even be a problem to solve with a regex, I'm just not sure how to go about it from this point.
Upvotes: 2
Views: 133
Reputation: 453
In addition to the answer provided by alfasin, you can do a second search to find sub-strings encased in a list:
s = '''1, 2, 3
[4, 5, 6]
3, 2, 1. '''
l = re.findall(r'\[.*\]', s)
# number of lists in string
print len(l)
# largest array by length of numbers in each list found
print max([re.findall(r'\d+', i) for i in l])
# number of numbers total in problem
print re.findall(r'\d+', s)
Upvotes: 0
Reputation: 53535
Assuming you get the input as a string - you can use re.findall
to extract only the numbers out of it:
import re
s = """[1, -2, 3]
1, 2, 3
1,2,3.
1, 2, 3, 4, 5"""
res = re.findall(r'-?\d+', s)
print res # ['1', '-2', '3', '1', '2', '3', '1', '2', '3', '1', '2', '3', '4', '5']
# and if you want to turn the strings into numbers:
print map(int, res) # [1, -2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 4, 5]
Upvotes: 2