Reputation: 2795
I have a list of strings in the following format:
target:
'TLS 1.2 x67 DHE-RSA-AES128-SHA256 DH 2048 AES128 TLS_DHE_RSA_WITH_AES_128_CBC_SHA256'
'TLS 1 x67 DHE-RSA-AES128-SHA256 DH 2048 AES128 TLS_DHE_RSA_WITH_AES_128_CBC_SHA256'
'TLS 1.1 x67 DHE-RSA-AES128-SHA256 DH 2048 AES128 TLS_DHE_RSA_WITH_AES_128_CBC_SHA256'
I want to know if only the exact match of 'TLS 1' (and not TLS 1.1 or TLS 1.2) exist in a line.
I have tried solutions in similar post as following:
#returns all the lines including TLS 1.1, TLS 1.2 ...
lines = []
for i in target:
if re.match(r'\bTLS 1\b', i):
lines.append(i)
also tried:
#returns nothing
lines = []
for i in target:
if re.match(r'^TLS 1$', i):
lines.append(i)
and many other variations with search or findall etc. How can I only grab the lines with exact and only exact match of a given word?
Upvotes: 1
Views: 2446
Reputation: 626802
You may consider the following approaches.
TLS
as a whole word should have a word boundary right in front of it, so that part is covered in your pattern.
If there must be a whitespace right after 1
, or end of string, it is more efficient to use a negative lookahead (?!\S)
: r'\bTLS 1(?!\S)'
. Well, you may also use r'\bTLS 1(?:\s|$)'
. See this regex demo.
If you just want to ensure there is no digit or a fractional part after 1
use
r'\bTLS 1(?!\.?\d)'
This will match TLS 1
that has no .
or .
+ digit after it. See this regex demo.
import re
target = ['TLS 1.2 x67 DHE-RSA-AES128-SHA256 DH 2048 AES128 TLS_DHE_RSA_WITH_AES_128_CBC_SHA256', 'TLS 1 x67 DHE-RSA-AES128-SHA256 DH 2048 AES128 TLS_DHE_RSA_WITH_AES_128_CBC_SHA256',
'TLS 1.1 x67 DHE-RSA-AES128-SHA256 DH 2048 AES128 TLS_DHE_RSA_WITH_AES_128_CBC_SHA256']
lines=[]
for i in target:
if re.match(r'\bTLS 1(?!\.?\d)', i):
lines.append(i)
print(lines)
Output:
['TLS 1 x67 DHE-RSA-AES128-SHA256 DH 2048 AES128 TLS_DHE_RSA_WITH_AES_128_CBC_SHA256']
Upvotes: 2
Reputation: 521194
Wiktor commented before I posted this (not surprising), but the marker for an exact match in this case is actually a space following TLS 1
. A word boundary is not specific enough, because that would also pick up things like TLS 1.1
, which you don't want. So try this version:
#returns all the lines including TLS 1.1, TLS 1.2 ...
lines = []
for i in target:
if re.match(r'\bTLS 1\s', i):
lines.append(i)
If the TLS
text could possibly be the very last thing in a line, then we can try using this:
re.match(r'\bTLS 1(?=(\s|$))', i)
Upvotes: 2