bootica
bootica

Reputation: 771

Regex to match following pattern in SQL query

I am trying to extract parts of a MySQL query to get the information I want.

I used this code / regex in Python:

import re
query = "SELECT `asd`.`ssss` as `column1`, `ss`.`wwwwwww` from `table`"
table_and_columns = re.findall('\`.*?`[.]\`.*?`',query)

My expected output:

['`asd`.`ssss`', `ss`.`wwwwwww`']

My real output:

['`asd`.`ssss`', '`column1`, `ss`.`wwwwwww`']

Can anybody help me and explain me where I went wrong? The regex should only find the ones that have two strings like asd and a dot in the middle.

PS: I know that this is not a valid query.

Upvotes: 0

Views: 401

Answers (3)

The fourth bird
The fourth bird

Reputation: 163632

The dot . can also match a backtick, so the pattern starts by matching a backtick and is able to match all chars until it reaches the literal dot in [.]

There is no need to use non greedy quantifiers, you can use a negated character class only prevent crossing the backtick boundary.

`[^`]*`\.`[^`]*`

Regex demo

The asterix * matches 0 or more times. If there has to be at least a single char, and newlines and spaces are unwanted, you could add \s to prevent matching whitespace chars and use + to match 1 or more times.

`[^`\s]+`\.`[^`\s]+`

Regex demo | Python demo

For example

import re
query = "SELECT `asd`.`ssss` as `column1`, `ss`.`wwwwwww` from `table`"
table_and_columns = re.findall('`[^`\s]+`\.`[^`\s]+`',query)
print(table_and_columns)

Output

['`asd`.`ssss`', '`ss`.`wwwwwww`']

Upvotes: 1

Anna Slastnikova
Anna Slastnikova

Reputation: 1558

The thing is that .*? matches any character (except for line terminators) even whitespaces. Also as you're already using * which means either 0 or unlimited occurrences,not sure you need to use ?. So this seems to work:

\`\S+\`[.]\`\S+\`

where \S is any non-whitespace character. You always can check you regexes using https://regex101.com

Upvotes: 0

Liju
Liju

Reputation: 2313

Please try below regex. Greedy nature of .* from left to right is what caused issue.
Instead you should search for [^`]*

`[^`]*?`\.`[^`]*?`

Demo

Upvotes: 0

Related Questions