user974896
user974896

Reputation: 1813

Extracting content BETWEEN a regex python?

Is there a simple method to pull content between a regex? Assume I have the following sample text

 SOME TEXT [SOME MORE TEXT] value="ssss" SOME MORE TEXT

My regex is:

 compiledRegex = re.compile('\[.*\] value=("|\').*("|\')')

This will obviously return the entire [SOME MORE TEXT] value="ssss", however I only want ssss to be returned since that's what I'm looking for

I can obviously define a parser function but I feel as if python provides some simple pythonic way to do such a task

Upvotes: 0

Views: 128

Answers (2)

jfs
jfs

Reputation: 414235

Your original regex is too greedy: r'.*\]' won't stop at the first ']' and the second '.*' won't stop at '"'. To stop at c you could use [^c] or '.*?':

regex = re.compile(r"""\[[^]]*\] value=("|')(.*?)\1""") 

Example

m = regex.search("""SOME TEXT [SOME MORE TEXT] value="ssss" SOME MORE TEXT""")
print m.group(2)

Upvotes: 0

Michael Madsen
Michael Madsen

Reputation: 55009

This is what capturing groups are designed to do.

compiledRegex = re.compile('\[.*\] value=(?:"|\')(.*)(?:"|\')') 
matches = compiledRegex.match(sampleText)
capturedGroup = matches.group(1) # grab contents of first group

The ?: inside the old groups (the parentheses) means that the group is now a non-capturing group; that is, it won't be accessible as a group in the result. I converted them to keep the output simpler, but you can leave them as capturing groups if you prefer (but then you have to use matches.group(2) instead, since the first quote would be the first captured group).

Upvotes: 2

Related Questions