armara
armara

Reputation: 557

Append strings to list from string

I'm working with a string that looks something like this (I save it from an error)

"['This is one' 'How is two' 'Why is three'\n 'When is four'] not in index"

From this string I would like to extract the substrings like this

['This is one', 'How is two', 'Why is three', 'When is four']

What I have done so far is to get the substrings (if the string is named s);

start = s.index("[") + len("[")
end = s.index("]")
s = s[start:end].replace("\\n", "")

Which gives me the output

'This is one' 'How is two' 'Why is three' 'When is four'

Now I just need to insert them into a list, this is where I'm having problems. I've tried this

s = s.split("'")

But it gave me the output

['', 'This is one', ' ', 'How is two', ' ', 'Why is three', ' ', 'When is four', '']

I also tried

s = s.split("'")
s = ' '.join(s).split()

Which gave me the output

['This', 'is', 'one', 'How', 'is', 'two', 'Why', 'is', 'three', 'When', 'is', 'four']

And I've tried the same but .split(" ") which gave me some weird whitespaces. I've also tried to use list(filter(...)), but it doesn't remove the strings in the list that has whitespace in it, only the completely empty strings.

Upvotes: 1

Views: 56

Answers (1)

Tim Biegeleisen
Tim Biegeleisen

Reputation: 520878

One approach would be to first extract the term in square brackets, then use re.findall to find all single quoted terms.

inp = "['This is one' 'How is two' 'Why is three'\n 'When is four'] not in index"
srch = re.search(r'\[(.*)\]', inp, flags=re.DOTALL)

if srch:
    matches = re.findall(r'\'(.*?)\'', srch.group(1))
    print(matches)

Output:

['This is one', 'How is two', 'Why is three', 'When is four']

Note carefully in the call to re.search that we use re.DOTALL mode. This is required because the content in square brackets actually has a newline in it.

Upvotes: 2

Related Questions