Reputation: 91
Following is a string which i like to parse
a=' //TS_START
/*TG_HEADER_START
title="XYX"
ident=""
*/
/*
<TC_HEADER_START>
title=" Halted after Tester Connect"
ident="TC1"
variants="A C"
name="TC">
TestcaseDescription= This >
TestcaseRequirements=36978
StakeholderRequirements=1236
TestcaseParameters:
TS_Implemented=Yes;
TS_Automation=Automated;
TS_Techniques= Testing;
TS_Priority=1;
TS_Tested_By=qz9ghv;
TS_Review_done=Yes;
TS_Regression=No
TestcaseTestType=Test
</TC_HEADER_END>
<TC_HEADER_START>
title=" Halted after Tester Connect"
ident="TC1"
variants="A C"
name="TC">
TestcaseDescription= This >
TestcaseRequirements=36978
StakeholderRequirements=1236
TestcaseParameters:
TS_Implemented=Yes;
TS_Automation=Automated;
TS_Techniques= Testing;
TS_Priority=1;
TS_Tested_By=qz9ghv;
TS_Review_done=Yes;
TS_Regression=No
TestcaseTestType=Test
</TC_HEADER_END>
*/
testcase TC_GEEA2_VGM_DOIP_01(char strDescription[], char strReq[], char strParams[])
{
}
/*TG_HEADER_END*/
zd.a.S,D.,AS'
A/S,D/.A.SD./
//<TS_END>'
I like to parse the string and get a list of strings which starts from <TC_HEADER_START>
and ends with </TC_HEADER_END>
. I had tried writing the following regex which is matching all instead of the first match.
aa=re.findall(r'<TC_HEADER_START>([\s\S]*)</TC_HEADER_END>',a)
Expected output
aa=['<TC_HEADER_START>
title=" Halted after Tester Connect"
ident="TC1"
variants="A C"
name="TC">
TestcaseDescription= This >
TestcaseRequirements=36978
StakeholderRequirements=1236
TestcaseParameters:
TS_Implemented=Yes;
TS_Automation=Automated;
TS_Techniques= Testing;
TS_Priority=1;
TS_Tested_By=qz9ghv;
TS_Review_done=Yes;
TS_Regression=No
TestcaseTestType=Test
</TC_HEADER_END>','<TC_HEADER_START>
title=" Halted after Tester Connect"
ident="TC1"
variants="A C"
name="TC">
TestcaseDescription= This >
TestcaseRequirements=36978
StakeholderRequirements=1236
TestcaseParameters:
TS_Implemented=Yes;
TS_Automation=Automated;
TS_Techniques= Testing;
TS_Priority=1;
TS_Tested_By=qz9ghv;
TS_Review_done=Yes;
TS_Regression=No
TestcaseTestType=Test
</TC_HEADER_END>']
Upvotes: 0
Views: 54
Reputation: 3097
re.M , re.S _> https://docs.python.org/3/library/re.html?highlight=re.S#re.MULTILINE
import re
aa=re.findall(r'<TC_HEADER_START>(.*?)</TC_HEADER_END>',a,re.S)
print(len(aa))
print(aa[0])
Output:
2
title=" Halted after Tester Connect"
ident="TC1"
variants="A C"
name="TC">
TestcaseDescription= This >
TestcaseRequirements=36978
StakeholderRequirements=1236
TestcaseParameters:
TS_Implemented=Yes;
TS_Automation=Automated;
TS_Techniques= Testing;
TS_Priority=1;
TS_Tested_By=qz9ghv;
TS_Review_done=Yes;
TS_Regression=No
TestcaseTestType=Test
Upvotes: 0
Reputation: 1257
your regex is almost correct - you want to use a lazy quantifier (*?
) instead of a greedy one (*
).
try this:
<TC_HEADER_START>([\s\S]*?)</TC_HEADER_END>
or try it on regex101
if you want to include the enclosing tags, wrap them into capturing groups, too:
(<TC_HEADER_START>)([\s\S]*?)(</TC_HEADER_END>)
Upvotes: 1