Reputation: 11365
I have entries like the following:
"<![CDATA[Lorem ipsum feed for an interval of 30 seconds]]>"
How do I get the string between the innermost square brackets i.e. 'Lorem ipsum feed for an interval of 30 seconds'
Where some of the entries are plain strings and some are delimited by [] as above
Upvotes: 4
Views: 13780
Reputation: 388
Use the split()
method of str
. See the below code snippet:
string = "<![CDATA[[[[[Lorem ipsum feed for an interval of 30 seconds]]]]]]]>"
inner_str = string.split('[')[len(string.split('[')) -1 ].split(']')[0]
print inner_str
Upvotes: 10
Reputation: 2111
Use regular expressions:
import re
string = '<![CDATA[Lorem ipsum feed for an interval of 30 seconds]]>'
reverse = string[::-1]
start = len(string)-re.search(r'\[', reverse).start()
end = re.search(r'\]', string).start()
print(string[start:end])
You should find the text between the last [
and the first ]
. In the above code, I use the re.search()
function to find the first occurrence of a character. It is ok for finding the first occurrence of ]
. But to find the last occurrence of [
, I reverse the string and find the first occurrence of it (the position is subtracted from len(string)
since it is indexed backward).
Upvotes: 1
Reputation: 474
You can use what is mentioned in the answer to this question, except in order to get the inner most strings, you will have to recursively call that.
Modifying the accepted answer, you can achieve it using the following:
def find_inner(s):
temp = s.partition('[')[-1].rpartition(']')[0]
if not temp:
return s
return find_inner(temp)
Upvotes: 2