Reputation: 1813
Is there a simple method to pull content between a regex? Assume I have the following sample text
SOME TEXT [SOME MORE TEXT] value="ssss" SOME MORE TEXT
My regex is:
compiledRegex = re.compile('\[.*\] value=("|\').*("|\')')
This will obviously return the entire [SOME MORE TEXT] value="ssss", however I only want ssss to be returned since that's what I'm looking for
I can obviously define a parser function but I feel as if python provides some simple pythonic way to do such a task
Upvotes: 0
Views: 128
Reputation: 414235
Your original regex is too greedy: r'.*\]'
won't stop at the first ']'
and the second '.*'
won't stop at '"'
. To stop at c
you could use [^c]
or '.*?'
:
regex = re.compile(r"""\[[^]]*\] value=("|')(.*?)\1""")
m = regex.search("""SOME TEXT [SOME MORE TEXT] value="ssss" SOME MORE TEXT""")
print m.group(2)
Upvotes: 0
Reputation: 55009
This is what capturing groups are designed to do.
compiledRegex = re.compile('\[.*\] value=(?:"|\')(.*)(?:"|\')')
matches = compiledRegex.match(sampleText)
capturedGroup = matches.group(1) # grab contents of first group
The ?:
inside the old groups (the parentheses) means that the group is now a non-capturing group; that is, it won't be accessible as a group in the result. I converted them to keep the output simpler, but you can leave them as capturing groups if you prefer (but then you have to use matches.group(2)
instead, since the first quote would be the first captured group).
Upvotes: 2