Reputation: 1131
So I want to capture the indices in a string like this:
"Something bad happened! @ data[u'string_1'][u'string_2']['u2'][0]"
I want to capture the strings string_1
, string_2
, u2
, and 0
.
I was able to do this using the following regex:
re.findall("("
"((?<=\[u')|(?<=\['))" # Begins with [u' or ['
"[a-zA-Z0-9_\-]+" # Followed by any letters, numbers, _'s, or -'s
"(?='\])" # Ending with ']
")"
"|" # OR
"("
"(?<=\[)" # Begins with [
"[0-9]+" # Followed by any numbers
"(?=\])" # Endging with ]
")", message)
Problem is the result will include tuples with empty strings, as such:
[('string_1', '', ''), ('string_2', '', ''), ('u2', '', ''), ('', '', '0')]
Now I can easily filter out the empty strings from the result, but I would like to prevent them from appearing in the first place.
I believe that the reason for this is due to my capture groups. I tried to use ?:
in those group, but then my results were completely gone.
This is how I had attempted to do it:
re.findall("(?:"
"((?<=\[u')|(?<=\['))" # Begins with [u' or ['
"[a-zA-Z0-9_\-]+" # Followed by any letters, numbers, _'s, or -'s
"(?='\])" # Ending with ']
")"
"|" # OR
"(?:"
"(?<=\[)" # Begins with [
"[0-9]+" # Followed by any numbers
"(?=\])" # Endging with ]
")", message)
That results in the following output:
['', '', '', '']
I'm assuming the issue is due to me using lookbehinds along with the non-capturing groups. Any ideas on whether this is possible to do in Python?
Thanks
Upvotes: 1
Views: 402
Reputation: 3405
Regex: (?<=\[)(?:[^'\]]*')?([^'\]]+)
or \[(?:[^'\]]*')?([^'\]]+)
Python code:
def Years(text):
return re.findall(r'(?<=\[)(?:[^\'\]]*\')?([^\'\]]+)', text)
print(Years('Something bad happened! @ data[u\'string_1\'][u\'string_2\'][\'u2\'][0]'))
Output:
['string_1', 'string_2', 'u2', '0']
Upvotes: 1
Reputation: 67988
You can simplify your regex.
(?<=\[)u?'?([a-zA-Z0-9_\-]+)(?='?\])
See demo .
https://regex101.com/r/SA6shx/1
Upvotes: 1