Reputation: 771
I am trying to extract parts of a MySQL query to get the information I want.
I used this code / regex in Python:
import re
query = "SELECT `asd`.`ssss` as `column1`, `ss`.`wwwwwww` from `table`"
table_and_columns = re.findall('\`.*?`[.]\`.*?`',query)
My expected output:
['`asd`.`ssss`', `ss`.`wwwwwww`']
My real output:
['`asd`.`ssss`', '`column1`, `ss`.`wwwwwww`']
Can anybody help me and explain me where I went wrong?
The regex should only find the ones that have two strings like asd
and a dot in the middle.
PS: I know that this is not a valid query.
Upvotes: 0
Views: 401
Reputation: 163632
The dot .
can also match a backtick, so the pattern starts by matching a backtick and is able to match all chars until it reaches the literal dot in [.]
There is no need to use non greedy quantifiers, you can use a negated character class only prevent crossing the backtick boundary.
`[^`]*`\.`[^`]*`
The asterix *
matches 0 or more times. If there has to be at least a single char, and newlines and spaces are unwanted, you could add \s
to prevent matching whitespace chars and use +
to match 1 or more times.
`[^`\s]+`\.`[^`\s]+`
For example
import re
query = "SELECT `asd`.`ssss` as `column1`, `ss`.`wwwwwww` from `table`"
table_and_columns = re.findall('`[^`\s]+`\.`[^`\s]+`',query)
print(table_and_columns)
Output
['`asd`.`ssss`', '`ss`.`wwwwwww`']
Upvotes: 1
Reputation: 1558
The thing is that
.*?
matches any character (except for line terminators) even whitespaces.
Also as you're already using *
which means either 0 or unlimited occurrences,not sure you need to use ?
.
So this seems to work:
\`\S+\`[.]\`\S+\`
where \S
is any non-whitespace character.
You always can check you regexes using https://regex101.com
Upvotes: 0