Adi
Adi

Reputation: 95

awk - get string between 2 strings with repeating pattern

I've tried several options from previous posts I've found on Stack Overflow and other sites but still having problems getting the results I need.

The problem I am trying to solve is that I receive an XML string which I assign to variable one_ts. The string contains a list of steps with each step having a step Index (integer 1..n) and a stepText as a string. I need to read each stepText and do some processing on it until done reading the steps (or no steps are present in the XML).

The issue is that the awk command I am trying use to pattern match each stepIndex and get the text between the XML tags for the start and end of a step is not matching the start/end tags as I expect but returning the same string (a substring of the entire XML) for every stepIndex.

--> The result I want is for each iteration of indexcounter I want to retrieve the fields only for the associated stepIndex, e.g. when indexcounter=1 I need to get step_text = "<stepText>Step 1 text</stepText>" and for indexcounter=2 I need to get step_text = "<stepText>Step 2 text</stepText>" , etc.

Can anyone tell me why the awk is not working as expected or propose better command? Thanks.

This is what I have tried - Code snippet:

     one_ts="<steps><ns8:step stepIndex=\"1\" <stepText>Step 1 text</stepText></ns8:step><ns8:step stepIndex=\"2\" <stepText>Step 2 text</stepText></ns8:step><ns8:step stepIndex=\"3\" <stepText>Step 3 text</stepText></ns8:step></steps>"
     indexcounter=1
     
     while true
     do
          if [[ $one_ts =~ "<ns8:step stepIndex=\"$indexcounter\"" ]]; then
               matchpattern="/.*<ns8:step stepIndex=\"$indexcounter\""
               step_text=$(echo "$one_ts" | awk -v pat="$matchpattern" '/pat/{f=1}/<\/ns8:step>.*/{f=0;print;exit}f')
               # some processing on $step_text
          else
              exit;
          fi
          indexcounter=$((indexcounter+1))
     done

Upvotes: 0

Views: 39

Answers (0)

Related Questions