Reputation: 117
I'm trying to process some SQL code to find the parts of a select statement that would need to be grouped farther down in a query. For example:
In the string "Select person, age, name, sum(count distinct arrests) from..."
I would want "sum(count"
returned, because it's the only part of this string that has white space on either side and includes an open parenthesis.
I have been trying different things but am struggling.
I've tried re.compile(r'\W.*[)]') and am getting either way too much back or nothing at all.
Upvotes: 2
Views: 162
Reputation: 38502
How about a non-regex way with split()
and list-comprehension
some_list = "Select person, age, name, sum(count distinct arrests) from...".split(' ')
matching = [s for s in some_list if "(" in s][0]
print(matching) # sum(count
some_list = "COUNT(DISTINCT(case when etc...)".split(' ')
matching = [s for s in some_list if "(" in s][0]
print(matching) # COUNT(DISTINCT(case
WORKING DEMO: https://rextester.com/ZKJU83182
Upvotes: 0
Reputation: 163362
If the match can also occur at the start of the string, you could use lookarounds to assert what is on the left and in the right is not a non whitespace char \S
and use a repeating group (?:...)+
to match that 1+ times.
(?<!\S)(?:\w+\(\w+)+(?!\S)
That will match COUNT(DISTINCT(case
and sum(count
Upvotes: 0
Reputation: 82765
Use pattern (\w+\(\w+)\s+
Ex:
import re
s = "Select person, age, name, sum(count distinct arrests) from..."
print(re.search(r"(\w+\(\w+)\s+", s).group(1))
Output:
sum(count
Upvotes: 1