Chuck
Chuck

Reputation: 1293

how to match either word or sentence in this Python regex?

I have a decent familiarity with regex but this is tricky. I need to find instances like this from a SQL case statement:

when col_name = 'this can be a word or sentence'

I can match the above when it's just one word, but when it's more than one word it's not working.

s = """when col_name = 'a sentence of words'"""

x = re.search("when\s(\w+)\s*=\s*\'(\w+)", s)

if x:
    print(x.group(1)) # this returns "col_name"
    print(x.group(2)) # this returns "a"

I want group(2) to return "a sentence of words" but I'm just getting the first word. That part could either be one word or several. How to do it?

When I add in the second \', then I get no match:

x = re.search("when\s(\w+)\s*=\s*\'(\w+)\'", s)

Upvotes: 1

Views: 127

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626691

You may match all characters other than single quotation mark rather than matching letters, digits and connector punctuation ("word" chars) with the Group 2 pattern:

import re
s = """when col_name = 'a sentence of words'"""
x = re.search(r"when\s+(\w+)\s*=\s*'([^']+)", s)
if x:
    print(x.group(1)) # this returns "col_name"
    print(x.group(2)) # this returns "a sentence of words"

See the Python demo

The [^'] is a negated character class that matches any char but a single quotation mark, see the regex demo.

In case the string can contain escaped single quotes, you may consider replacing [^'] with

  • If the escape char is ': ([^']*(?:''[^']*)*)
  • If the escape char is \: ([^\\']*(?:\\.[^'\\]*)*).

Note the use of the raw string literal to define the regex pattern (all backslashes are treated as literal backslashes inside it).

Upvotes: 1

Related Questions