Anand
Anand

Reputation: 1122

Splitting numbers and strings repeatedly

I have the following string:

s = index ( 1.0000000e+00 2.0000000e+00 3.0000000e+00)  _x_ ( error error error ) t ( 1.2500000e+02 1.2500000e+02 1.2500000e+02 ) 

I need this to be split into a list as follows:

['index', '1.0000000e+00 2.0000000e+00 3.0000000e+00', 
'_x_', 'error error error',
't', '1.2500000e+02 1.2500000e+02 1.2500000e+02']

I am unable to come up with a regex for doing this.

Upvotes: 0

Views: 94

Answers (3)

jgritty
jgritty

Reputation: 11915

Here is a list comprehension that does it:

[item.strip() for item in s.replace("(", ")").split(")")]

Here's some code that basically does what you want. Almost.

mylist = []
for item in s.replace("(", ";").replace(")", ";").split(";"):
    mylist.append(item.strip())

print mylist[:-1]

Output:

['index', '1.0000000e+00 2.0000000e+00 3.0000000e+00', '_x_', 'error error error', 't', '1.2500000e+02 1.2500000e+02 1.2500000e+02']

Upvotes: 2

Alex Kotliarov
Alex Kotliarov

Reputation: 183

You can use following regex to split this string (Very last list's item would be an empty string.):

    import re
    s = "index ( 1.0000000e+00 2.0000000e+00 3.0000000e+00)  _x_ ( error error error ) t ( 1.2500000e+02 1.2500000e+02 1.2500000e+02 ) "
    re.split("\s*?(?:\(|\))\s*", s)

This results in:

['index', '1.0000000e+00 2.0000000e+00 3.0000000e+00', '_x_', 'error error error', 't', '1.2500000e+02 1.2500000e+02 1.2500000e+02', '']

Also, you could use following regex to extract your string's components and then process them (e.g. strip white space from a substring). This regex assumes that string has balanced left/right parens:

re.findall("(?:(?<=\()[^)]*?(?=\))|[a-z_]+)",s)

It should produce following output:

['index', ' 1.0000000e+00 2.0000000e+00 3.0000000e+00', '_x_', ' error error error ', 't', ' 1.2500000e+02 1.2500000e+02 1.2500000e+02 ']

Upvotes: 4

Peter Gibson
Peter Gibson

Reputation: 19544

Similar to @AlexKotliarov's answer, but just splitting on whitespace and parens

>>> import re
>>> re.split(r'[\s()]+', s)

Output:

['index', '1.0000000e+00', '2.0000000e+00', '3.0000000e+00', '_x_', 'error', 'error', 'error', 't', '1.2500000e+02', '1.2500000e+02', '1.2500000e+02', '']

Explaination:

Split on one or more characters + in the set [..]: whitespace \s and parenthesis ( & )

Upvotes: 1

Related Questions