w128
w128

Reputation: 4928

How to convert a string containing a list of values that are not comma-separated to a list?

I'm new to Python and am wondering what is the most elegant way to convert a string of the form "[1 2 3]" to a list? If the string contains a comma-separated list of values, then the solution is simple:

str = "['x', 'y', 'z']"
arr = eval(str)
print isinstance(arr, list) # True

However, this solution doesn't work if the list in the string is not comma separated, e.g. "['x' 'y' 'z']".

Is there a common way to solve this without having to manually parse the string? The solution should not be type dependent, e.g. both "[1 2 3]" and "['multiple words 1' 'multiple words 2']" should be converted normally.

Upvotes: 0

Views: 1286

Answers (2)

Roel Schroeven
Roel Schroeven

Reputation: 1822

In this case shlex might be a solution.

import shlex

s = "['x' 'y' 'z']"
# First get rid of the opening and closing brackets
s = s.strip('[]')
# Split the string using shell-like syntax
lst = shlex.split(s)
print(type(lst), lst)

# Prints: <class 'list'> ['x', 'y', 'z']

But you'll have to check if it fulfills your requirements.

Upvotes: 2

Jongware
Jongware

Reputation: 22457

import re

str = "[1 2 a 'multiple words 1' 'multiple words 2' 'x' 'y' 'z']"
print ([''.join(x) for x in re.findall("'(.*?)'|(\S+)", re.sub(r'^\[(.*)\]', r'\1', str))])
>>> ['1', '2', 'a', 'multiple words 1', 'multiple words 2', 'x', 'y', 'z']

The first obvious step is to get rid of the [...] because they don't add anything useful to the results ...

Then it works because of the regex in findall: this will only match either anything between quotes or any sequence of non-spaces.

We don't want the quotes themselves (or do we? – it is not specified) so the regex grouping allows it to return just the inner parts.

Then we always get pairs of one element empty and one filled (('', '1'), ('', '2') and so on) so we need an additional cleaning loop.

This code cannot see the difference between [1 2 3] and ['1' '2' '3'], but that's no problem as such a variant is not specified in the question.

Upvotes: 1

Related Questions