Extract surrounding XML Tags by child content

Question

I have an XML file that basically looks like this:


  
      
        
      
      NumberOne
  
  
      NumberTwo

What I want to do is to extract the complete products. Product Node by searching for

SEARCH_TEXT

So for example, for NumberOne I would get the surrounding Product (Id="1") tags and their content.

Example: for the search text "NumberOne" the desired result is:


      
        
      
      NumberOne

for the search text "NumberTwo" it would be


      NumberTwo

What I tried is this regex (Python):

)[\S|\s])*NumberOne((?!)[\S|\s])*

But this does dot work because of the nested Products. Does anyone have a hint for solving this?

I read that regex is not the smartest approach for these kinds of XML searching problems. In reality the topLevel Products are weigh more complex, and I need to merge two XML files that look like my example. So I was hoping by using regex I could solve this on "string" level rather than on XML Parser level where I might need to prepare those complex objects before generating the final XML output. Just find the topLevel Product by that Identifier value, and grab them completely - no matter what they contain otherwise.

Thanks a lot.

UPDATE: Based on Jack Fleeting's solution - this is what I ended up using (XPath):

//products//Product[Attribute[@Name="Identifier" and text()="NumberOne"]]

Jack Fleeting · Accepted Answer

It is indeed not a good idea to try to parse xml with regex. Using xpath should get you there, assuming I understand you correctly. For example,

//Product[.//*[.="NumberOne"]]

should output:


      
        
      
      NumberOne

etc.

Extract surrounding XML Tags by child content

Answers (1)

Related Questions