Reputation:
I have a list of strings such as this :
['z+2-44', '4+55+z+88']
How can I split this strings in the list such that it would be something like
[['z','+','2','-','44'],['4','+','55','+','z','+','88']]
I have tried using the split
method already however that splits the 44 into 4 and 4, and am not sure what else to try.
Upvotes: 13
Views: 3177
Reputation: 107287
You could only use str.replace()
and str.split()
built-in functions within a list comprehension:
In [34]: lst = ['z+2-44', '4+55+z+88']
In [35]: [s.replace('+', ' + ').replace('-', ' - ').split() for s in lst]
Out[35]: [['z', '+', '2', '-', '44'], ['4', '+', '55', '+', 'z', '+', '88']]
But note that this is not an efficient approach for longer strings. In that case the best way to go is using regex.
As another pythonic way you can also use tokenize
module:
In [56]: from io import StringIO
In [57]: import tokenize
In [59]: [[t.string for t in tokenize.generate_tokens(StringIO(i).readline)][:-1] for i in lst]
Out[59]: [['z', '+', '2', '-', '44'], ['4', '+', '55', '+', 'z', '+', '88']]
The tokenize module provides a lexical scanner for Python source code, implemented in Python. The scanner in this module returns comments as tokens as well, making it useful for implementing “pretty-printers,” including colorizers for on-screen displays.
Upvotes: 5
Reputation: 140188
That will work, using itertools.groupby
z = ['z+2-44', '4+55+z+88']
print([["".join(x) for k,x in itertools.groupby(i,str.isalnum)] for i in z])
output:
[['z', '+', '2', '-', '44'], ['4', '+', '55', '+', 'z', '+', '88']]
It just groups the chars if they're alphanumerical (or not), just join them back in a list comprehension.
EDIT: the general case of a calculator with parenthesis has been asked as a follow-up question here. If z
is as follows:
z = ['z+2-44', '4+55+((z+88))']
then with the previous grouping we get:
[['z', '+', '2', '-', '44'], ['4', '+', '55', '+((', 'z', '+', '88', '))']]
Which is not easy to parse in terms of tokens. So a change would be to join
only if alphanum, and let as list if not, flattening in the end using chain.from_iterable
:
print([list(itertools.chain.from_iterable(["".join(x)] if k else x for k,x in itertools.groupby(i,str.isalnum))) for i in z])
which yields:
[['z', '+', '2', '-', '44'], ['4', '+', '55', '+', '(', '(', 'z', '+', '88', ')', ')']]
(note that the alternate regex answer can also be adapted like this: [re.findall('\w+|\W', s) for s in lst]
(note the lack of +
after W
)
also "".join(list(x))
is slightly faster than "".join(x)
, but I'll let you add it up to avoid altering visibility of that already complex expression.
Upvotes: 14
Reputation: 214957
You can use regex:
import re
lst = ['z+2-44', '4+55+z+88']
[re.findall('\w+|\W+', s) for s in lst]
# [['z', '+', '2', '-', '44'], ['4', '+', '55', '+', 'z', '+', '88']]
\w+|\W+
matches a pattern that consists either of word characters (alphanumeric values in your case) or non word characters (+-
signs in your case).
Upvotes: 26
Reputation: 2529
If you want to stick with split
(hence avoiding regex), you can provide it with an optional character to split on:
>>> testing = 'z+2-44'
>>> testing.split('+')
['z', '2-44']
>>> testing.split('-')
['z+2', '44']
So, you could whip something up by chaining the split commands.
However, using regular expressions would probably be more readable:
import re
>>> re.split('\+|\-', testing)
['z', '2', '44']
This is just saying to "split the string at any + or - character" (the backslashes are escape characters because both of those have special meaning in a regex.
Lastly, in this particular case, I imagine the goal is something along the lines of "split at every non-alpha numeric character", in which case regex can still save the day:
>>> re.split('[^a-zA-Z0-9]', testing)
['z', '2', '44']
It is of course worth noting that there are a million other solutions, as discussed in some other SO discussions.
Python: Split string with multiple delimiters
Split Strings with Multiple Delimiters?
My answers here are targeted towards simple, readable code and not performance, in honor of Donald Knuth
Upvotes: -1
Reputation: 92854
Alternative solution using re.split function:
l = ['z+2-44', '4+55+z+88']
print([list(filter(None, re.split(r'(\w+)', i))) for i in l])
The output:
[['z', '+', '2', '-', '44'], ['4', '+', '55', '+', 'z', '+', '88']]
Upvotes: 6