Reputation: 1122
I have the following string:
s = index ( 1.0000000e+00 2.0000000e+00 3.0000000e+00) _x_ ( error error error ) t ( 1.2500000e+02 1.2500000e+02 1.2500000e+02 )
I need this to be split into a list as follows:
['index', '1.0000000e+00 2.0000000e+00 3.0000000e+00',
'_x_', 'error error error',
't', '1.2500000e+02 1.2500000e+02 1.2500000e+02']
I am unable to come up with a regex for doing this.
Upvotes: 0
Views: 94
Reputation: 11915
Here is a list comprehension that does it:
[item.strip() for item in s.replace("(", ")").split(")")]
Here's some code that basically does what you want. Almost.
mylist = []
for item in s.replace("(", ";").replace(")", ";").split(";"):
mylist.append(item.strip())
print mylist[:-1]
Output:
['index', '1.0000000e+00 2.0000000e+00 3.0000000e+00', '_x_', 'error error error', 't', '1.2500000e+02 1.2500000e+02 1.2500000e+02']
Upvotes: 2
Reputation: 183
You can use following regex to split this string (Very last list's item would be an empty string.):
import re
s = "index ( 1.0000000e+00 2.0000000e+00 3.0000000e+00) _x_ ( error error error ) t ( 1.2500000e+02 1.2500000e+02 1.2500000e+02 ) "
re.split("\s*?(?:\(|\))\s*", s)
This results in:
['index', '1.0000000e+00 2.0000000e+00 3.0000000e+00', '_x_', 'error error error', 't', '1.2500000e+02 1.2500000e+02 1.2500000e+02', '']
Also, you could use following regex to extract your string's components and then process them (e.g. strip white space from a substring). This regex assumes that string has balanced left/right parens:
re.findall("(?:(?<=\()[^)]*?(?=\))|[a-z_]+)",s)
It should produce following output:
['index', ' 1.0000000e+00 2.0000000e+00 3.0000000e+00', '_x_', ' error error error ', 't', ' 1.2500000e+02 1.2500000e+02 1.2500000e+02 ']
Upvotes: 4
Reputation: 19544
Similar to @AlexKotliarov's answer, but just splitting on whitespace and parens
>>> import re
>>> re.split(r'[\s()]+', s)
Output:
['index', '1.0000000e+00', '2.0000000e+00', '3.0000000e+00', '_x_', 'error', 'error', 'error', 't', '1.2500000e+02', '1.2500000e+02', '1.2500000e+02', '']
Explaination:
Split on one or more characters +
in the set [
..]
: whitespace \s
and parenthesis (
& )
Upvotes: 1