user7190127
user7190127

Reputation:

How to properly split this list of strings?

I have a list of strings such as this :

['z+2-44', '4+55+z+88']

How can I split this strings in the list such that it would be something like

[['z','+','2','-','44'],['4','+','55','+','z','+','88']]

I have tried using the split method already however that splits the 44 into 4 and 4, and am not sure what else to try.

Upvotes: 13

Views: 3177

Answers (5)

Kasravnd
Kasravnd

Reputation: 107287

You could only use str.replace() and str.split() built-in functions within a list comprehension:

In [34]: lst = ['z+2-44', '4+55+z+88']

In [35]: [s.replace('+', ' + ').replace('-', ' - ').split() for s in lst]
Out[35]: [['z', '+', '2', '-', '44'], ['4', '+', '55', '+', 'z', '+', '88']]

But note that this is not an efficient approach for longer strings. In that case the best way to go is using regex.

As another pythonic way you can also use tokenize module:

In [56]: from io import StringIO

In [57]: import tokenize

In [59]: [[t.string for t in tokenize.generate_tokens(StringIO(i).readline)][:-1] for i in lst]
Out[59]: [['z', '+', '2', '-', '44'], ['4', '+', '55', '+', 'z', '+', '88']]

The tokenize module provides a lexical scanner for Python source code, implemented in Python. The scanner in this module returns comments as tokens as well, making it useful for implementing “pretty-printers,” including colorizers for on-screen displays.

Upvotes: 5

Jean-François Fabre
Jean-François Fabre

Reputation: 140188

That will work, using itertools.groupby

z = ['z+2-44', '4+55+z+88']

print([["".join(x) for k,x in itertools.groupby(i,str.isalnum)] for i in z])

output:

[['z', '+', '2', '-', '44'], ['4', '+', '55', '+', 'z', '+', '88']]

It just groups the chars if they're alphanumerical (or not), just join them back in a list comprehension.

EDIT: the general case of a calculator with parenthesis has been asked as a follow-up question here. If z is as follows:

z = ['z+2-44', '4+55+((z+88))']

then with the previous grouping we get:

[['z', '+', '2', '-', '44'], ['4', '+', '55', '+((', 'z', '+', '88', '))']]

Which is not easy to parse in terms of tokens. So a change would be to join only if alphanum, and let as list if not, flattening in the end using chain.from_iterable:

print([list(itertools.chain.from_iterable(["".join(x)] if k else x for k,x in itertools.groupby(i,str.isalnum))) for i in z])

which yields:

[['z', '+', '2', '-', '44'], ['4', '+', '55', '+', '(', '(', 'z', '+', '88', ')', ')']]

(note that the alternate regex answer can also be adapted like this: [re.findall('\w+|\W', s) for s in lst] (note the lack of + after W)

also "".join(list(x)) is slightly faster than "".join(x), but I'll let you add it up to avoid altering visibility of that already complex expression.

Upvotes: 14

akuiper
akuiper

Reputation: 214957

You can use regex:

import re
lst = ['z+2-44', '4+55+z+88']
[re.findall('\w+|\W+', s) for s in lst]
# [['z', '+', '2', '-', '44'], ['4', '+', '55', '+', 'z', '+', '88']]

\w+|\W+ matches a pattern that consists either of word characters (alphanumeric values in your case) or non word characters (+- signs in your case).

Upvotes: 26

MrName
MrName

Reputation: 2529

If you want to stick with split (hence avoiding regex), you can provide it with an optional character to split on:

>>> testing = 'z+2-44'
>>> testing.split('+')
['z', '2-44']
>>> testing.split('-')
['z+2', '44']

So, you could whip something up by chaining the split commands.

However, using regular expressions would probably be more readable:

import re

>>> re.split('\+|\-', testing)
['z', '2', '44']

This is just saying to "split the string at any + or - character" (the backslashes are escape characters because both of those have special meaning in a regex.

Lastly, in this particular case, I imagine the goal is something along the lines of "split at every non-alpha numeric character", in which case regex can still save the day:

>>> re.split('[^a-zA-Z0-9]', testing)
['z', '2', '44']

It is of course worth noting that there are a million other solutions, as discussed in some other SO discussions.

Python: Split string with multiple delimiters

Split Strings with Multiple Delimiters?

My answers here are targeted towards simple, readable code and not performance, in honor of Donald Knuth

Upvotes: -1

RomanPerekhrest
RomanPerekhrest

Reputation: 92854

Alternative solution using re.split function:

l = ['z+2-44', '4+55+z+88']
print([list(filter(None, re.split(r'(\w+)', i))) for i in l])

The output:

[['z', '+', '2', '-', '44'], ['4', '+', '55', '+', 'z', '+', '88']]

Upvotes: 6

Related Questions