P A N
P A N

Reputation: 5922

Regex captures more than intended

I have written this simple Regex to capture the six characters after "WKN ", but I must be doing something wrong because it returns "WKN" also.

search_reply = "WKN A12BHF, IS3R"

wkn = re.search("WKN\s(.{6})", search_reply.text)

>>> "WKN A12BHF"

For this example, I would like to keep only "A12BHF".

Upvotes: 2

Views: 299

Answers (4)

Timo Richter
Timo Richter

Reputation: 174

A common regex for finding WKN, also without the "WKN" at the beginning:

re.search("(?<!\S)[A-Z0-9]{6}(?!\S)", text)

Upvotes: 1

vks
vks

Reputation: 67968

wkn = re.search("WKN\s(.{6})", search_reply.text).group(1)

Should do it.Your regex is correct.What you want is being captured in a group ()

Upvotes: 2

anubhava
anubhava

Reputation: 785128

You can use a positive lookbehind here:

>>> re.search(r"(?<=WKN\s).{6}", search_reply).group()
'A12BHF'

(?<=WKN\s) asserts that 6 digit text is preceded by WKN and a space.

Upvotes: 1

Aleksander Monk
Aleksander Monk

Reputation: 2907

import re

search_reply = "WKN A12BHF, IS3R"

wkn = re.search("(WKN\s)(.{6})", search_reply)

print(wkn.group(2))

try this

Upvotes: 2

Related Questions