Reputation: 75
I have data that looks like this:
data = '5.2 -34 435, 34 2.908 3, 50 2 54 3, 40 50'
I am trying to write a regex such that each items in a Python list created by re.findall
only contains 3 or less numbers bounded by the commas as shown above.
If there are more than 3 numbers bounded by commas, then place the remaining numbers before comma in the next item within a list (as long as it's less than or equal to 3)
Ideally the output for the above data would look like this
['5.2 -34 435','34 2.908 3','50 2 54','3','40 50']
I tried to write up the following looking at tutorials but it doesn't seem to work too well...
re.findall(r"[-+A-z0-9.\s]{3}", data)
Upvotes: 3
Views: 78
Reputation: 785128
This can be achieved in a single operation using findall
using this regex:
[+-]?\d+(?:\.\d+)?(?:\s+[+-]?\d+(?:\.\d+)?){0,2}
[+-]?\d+(?:\.\d+)?
as a pattern for matching a signed number that may or may not be a floating point number.(?:\s+[+-]?\d+(?:\.\d+)?){0,2}
matches more 0 to 2 instances of that number
Code:import re
data = '5.2 -34 435, 34 2.908 3, 50 2 54 3, 40 50'
rx = re.compile(r'[+-]?\d+(?:\.\d+)?(?:\s+[+-]?\d+(?:\.\d+)?){0,2}')
print (rx.findall(data))
Output:
['5.2 -34 435', '34 2.908 3', '50 2 54', '3', '40 50']
Upvotes: 3
Reputation: 7040
No need to use regex for this. The following code should do what you want:
def flatten(x)
return [item for sublist in x for item in sublist]
def chunk(x, n):
return [x[i:i + n] for i in range(0, len(x), n)]
data = '5.2 -34 435, 34 2.908 3, 50 2 54 3, 40 50'
chunked = [chunk(x.strip().split(' '), 3) for x in data.split(',')]
output = [' '.join(x) for x in flatten(chunked)]
>>> output
['5.2 -34 435', '34 2.908 3', '50 2 54', '3', '40 50']
We split the data on the comma, and then split each of those pieces into the numbers by splitting on the space character. Those sublists of numbers are then broken into chunks of 3. We then flatten one level of nesting away, leaving us with a list of lists each containing up to 3 numbers (as strings). To get output
we then simply join these chunks of up to 3 numbers together with spaces between them.
Upvotes: 2