mandar munagekar
mandar munagekar

Reputation: 89

How to copy subsequent text after matching a pattern?

I have a text file with each line look something like this -

GeneralBKT_n24_-e_dee_testcaseid_blt12_0001_s3_n4

Each line has keyword testcaseid followed by some test case id (in this case blt12_0001 is the id and s3 and n4 are some parameters). I want to extract blt12_0001 from the above line. Each testcaseid will have exactly 1 underscore '_' in-between. What would be a regex for this case and how can I store name of test case id in a variable.

Upvotes: 1

Views: 448

Answers (3)

Emma
Emma

Reputation: 27723

Another option that might work would be:

import re
expression = r"[^_\r\n]+_[^_\r\n]+(?=(?:_[a-z0-9]{2}){2}$)"

string = '''

GeneralBKT_n24_-e_dee_testcaseid_blt12_0001_s3_n4
GeneralBKT_n24_-e_dee_testcaseid_blt81_0023_s4_n5

'''

print(re.findall(expression, string, re.M))

Output

['blt12_0001', 'blt81_0023']

Demo

RegEx Circuit

jex.im visualizes regular expressions:

enter image description here


If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.


Upvotes: 1

Jan
Jan

Reputation: 43169

You could make use of capturing groups:

testcaseid_([^_]+_[^_]+)

See a demo on regex101.com.


One of many possible ways in Python could be

import re

line = "GeneralBKT_n24_-e_dee_testcaseid_blt12_0001_s3_n4"

for id in re.finditer(r'testcaseid_([^_]+_[^_]+)', line):
    print(id.group(1))

See a demo on ideone.com.

Upvotes: 2

Silvanas
Silvanas

Reputation: 613

You can use this regex to capture your testcaseid given in your format,

(?<=testcaseid_)[^_]+_[^_]+

This essentially captures a text having exactly one underscore between them and preceded by testcaseid_ text using positive lookbehind. Here [^_]+ captures one or more any character other than underscore, followed by _ then again uses [^_]+ to capture one or more any character except _

Check out this demo

Check out this Python code,

import re

list = ['GeneralBKT_n24_-e_dee_testcaseid_blt12_0001_s3_n4', 'GeneralBKT_n24_-e_dee_testcaseid_blt12_0001_s6_n9']

for s in list:
 grp = re.search(r'(?<=testcaseid_)[^_]+_[^_]+', s)
 if grp:
  print(grp.group())

Output,

blt12_0001
blt12_0001

Upvotes: 1

Related Questions