Reputation: 89
I have a text file with each line look something like this -
GeneralBKT_n24_-e_dee_testcaseid_blt12_0001_s3_n4
Each line has keyword testcaseid
followed by some test case id (in this case blt12_0001
is the id and s3
and n4
are some parameters). I want to extract blt12_0001
from the above line. Each testcaseid will have exactly 1 underscore '_' in-between. What would be a regex for this case and how can I store name of test case id in a variable.
Upvotes: 1
Views: 448
Reputation: 27723
Another option that might work would be:
import re
expression = r"[^_\r\n]+_[^_\r\n]+(?=(?:_[a-z0-9]{2}){2}$)"
string = '''
GeneralBKT_n24_-e_dee_testcaseid_blt12_0001_s3_n4
GeneralBKT_n24_-e_dee_testcaseid_blt81_0023_s4_n5
'''
print(re.findall(expression, string, re.M))
['blt12_0001', 'blt81_0023']
jex.im visualizes regular expressions:
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
Upvotes: 1
Reputation: 43169
You could make use of capturing groups:
testcaseid_([^_]+_[^_]+)
Python
could be
import re
line = "GeneralBKT_n24_-e_dee_testcaseid_blt12_0001_s3_n4"
for id in re.finditer(r'testcaseid_([^_]+_[^_]+)', line):
print(id.group(1))
See a demo on ideone.com.
Upvotes: 2
Reputation: 613
You can use this regex to capture your testcaseid given in your format,
(?<=testcaseid_)[^_]+_[^_]+
This essentially captures a text having exactly one underscore between them and preceded by testcaseid_
text using positive lookbehind. Here [^_]+
captures one or more any character other than underscore, followed by _
then again uses [^_]+
to capture one or more any character except _
Check out this Python code,
import re
list = ['GeneralBKT_n24_-e_dee_testcaseid_blt12_0001_s3_n4', 'GeneralBKT_n24_-e_dee_testcaseid_blt12_0001_s6_n9']
for s in list:
grp = re.search(r'(?<=testcaseid_)[^_]+_[^_]+', s)
if grp:
print(grp.group())
Output,
blt12_0001
blt12_0001
Upvotes: 1