Reputation: 19220

How do I get all the text between the last two instances of a token in bash?

I’m using bash and running the following command to get all the file text between two tokens (including the tokens themselves):

cat /usr/java/jboss/standalone/log/server.log | sed -n \
'/Starting deployment of "myproject.war"/,/Registering web context: \/myproject/p'

However, sometimes the tokens occur multiple times in the file. How do I adjust the above so that only the text between the last two occurrences of the tokens (including the tokens themselves) will be returned?

Upvotes: 2

Answers (5)

Håkon Hægland

Reputation: 40778

With perl:

perl -0xFF -nE '@x = /WWWW Starting deployment of "myproject.war"(.*?)Registering web context: \/myproject/sg; say $x[-1] ' file

Upvotes: 0

mklement0

Reputation: 439193

This solution is not efficient, but easier to understand:

file='/usr/java/jboss/standalone/log/server.log'

s1='Starting deployment of "myproject.war"'
s2='Registering web context: \/myproject'

sed -n '/'"$s1"'/,/'"$s2"'/p' "$file" | 
  tac | 
  awk '/'"$s1"'/ {print;exit} 1' | 
  tac

Lets sed report ALL ranges first.
Reverses the result using tac (on OSX, use tail -r).
Using awk, outputs everything up to and including the first occurrence of the first substring, which - in the reversed result - spans the end of the last range to the start of the last range.
Reverses the output from awk to render the last range in correct order.

Note: For consistency with the variable use in the sed command I've spliced a variable reference directly into the awk program, too, which is otherwise poor practice (use -v to pass variables instead).

Upvotes: 0

anubhava

Reputation: 785481

This awk can work:

awk '/Starting deployment of "myproject.war"/{i=0; s=1; delete a;}
   s{a[++i]=$0}
   /Registering web context: \/myproject/{s=0}
   END {print i; for (k=1; k<=i; k++) print a[k]}' file

Upvotes: 0

Charles Duffy

Reputation: 295619

You can do this in native bash -- no need for awk, tac, or any other external tool.

token1='Starting deployment of "myproject.war"'
token2='Registering web context: /myproject/'
writing=0
while read -r; do
  (( ! writing )) && [[ $REPLY = $token1 ]] && {
    # start collecting content, into an empty buffer, when we see token1
    writing=1                    # set flag to store lines we see
    collected_content=()         # clear the array of lines found so far
  }
  (( writing )) && {
    # when the flag is set, collect content into an array
    collected_content+=( "$REPLY" )
  }
  [[ $REPLY = $token2 ]] && {
    # stop collecting content when we see token2
    writing=0
  }
done <server.log # redirect from the log into the loop

# print all collected lines
printf '%s\n' "${collected_content[@]}"

Upvotes: 0

jaypal singh

Reputation: 77135

How about some tic-tac-toe.

tac /usr/java/jboss/standalone/log/server.log | 
awk '/Registering web context: \/myproject/{p=1;++cnt}/Starting deployment of "myproject.war"/{if(cnt==2){print $0;exit};print $0;p=0}p' |
tac

Upvotes: 1

How do I get all the text between the last two instances of a token in bash?

Answers (5)

Related Questions