Keeto
Keeto

Reputation: 4208

using regex to get a substring from a string

I have a string in the form of:

integer, integer, a comma separated list of strings, integer

for example:

"0, 0, ['REFERENCED', 'UPTODATE', 'LRU'], 1" 

I want to return this substring ['REFERENCED', 'UPTODATE', 'LRU']

I thought of using split(", ") and then joining things together but it will just be so complicated. How to do that with regex?

Upvotes: 0

Views: 319

Answers (4)

Michael Laszlo
Michael Laszlo

Reputation: 12239

There is no need for a regex. Wrap your string in brackets to make a string representation of a list, then use ast.literal_eval to turn it into an actual list.

import ast
s = "0, 0, ['REFERENCED', 'UPTODATE', 'LRU'], 1"
outer_list = ast.literal_eval('[' + s + ']')
inner_list = outer_list[2]
print(inner_list)

You may be tempted to use eval instead of ast.literal_eval. Resist the temptation. Using eval is unsafe because it will evaluate any Python expression, even if it contains nasty stuff such as instructions to delete files from your hard drive. You can use ast.literal_eval without fear because it only parses strings, numbers, tuples, lists, dicts, booleans, and None.

Upvotes: 1

mgilson
mgilson

Reputation: 310287

Just write a regular expression to capture a group that consist of a [, any characters and then a ].

>>> import re
>>> s = "0, 0, ['REFERENCED', 'UPTODATE', 'LRU'], 1"
>>> re.search(r'(\[.*\])', s).group(1)
"['REFERENCED', 'UPTODATE', 'LRU']"

If the input really is this well structured, you could use ast.literal_eval:

>>> import ast
>>> ast.literal_eval(s)[2]
['REFERENCED', 'UPTODATE', 'LRU']

To safely evaluate strings that contain python literals and pull the third element out of the tuple.

Upvotes: 2

Padraic Cunningham
Padraic Cunningham

Reputation: 180540

s = "0, 0, ['REFERENCED', 'UPTODATE', 'LRU'], 1"
start = s.find("[")
end = s.rfind("]")
print(s[start:end+1])
['REFERENCED', 'UPTODATE', 'LRU']

Upvotes: 1

SierraOscar
SierraOscar

Reputation: 17647

If you're just looking for an expression, try something like:

"\[([\w\d,']+)\]"

Upvotes: 0

Related Questions