Reputation: 391

get specific lines from a repeated range pattern in a text file

Wow, this sounds so complicated in the title, but I assume it is not quite so.

I have text files that have basically this layout:

Stimulus ...
...
...
...
Response
Stimulus ...
...
...
...
Response

I used sed to get everything in between and then further extracted information I needed.

sed -n -e '/Stimulus/,/Response/ p'

However, sometimes the participants do not respond, in which case the file looks like this:

Stimulus ...
...
...
...
Stimulus ...
...
...
...
Response

In this special case, my script will not get what I am looking for. So, I am looking for a way to extract the information if and only if the pattern1 is followed by pattern2, not pattern1 again.

Let me know, if I formulated it unclear. I am more then happy to provide further information.

Upvotes: 6

Answers (6)

TrueY

Reputation: 7610

This is a pure bash solution:

tmp=()
while read l; do
  [[ $l =~ ^Stimulus ]] && tmp=("$l") && continue
  [ ${#tmp[@]} -eq 0 ] && continue
  tmp+=("$l")
  [[ $l =~ ^Response ]] && printf "%s\n" "${tmp[@]}" && tmp=()
done <infile

It starts to fill up the array tmp if a list starting with Stimulus found. If another Stimulus arrives, it just clears tmp and starts the job again. If Response found, it prints the content of the tmp array. Actually printf built-in does an implicit loop.

Input:

cat >infile <<XXX
...
Response 0
...
Stimulus 1
...
Stimulus 2
...
Response 2
...
Stimulus 3
...
Response 3
...
Response 4
XXX

Output:

Stimulus 2
...
Response 2
Stimulus 3
...
Response 3

Upvotes: 5

captcha

Reputation: 3756

Really nice & easy job for GNU sed, one-way, no unwanted pipes & tools:

sed -n 'H;/^Stimulus/{h;d};/^Response/{x;s/^Response//;tk;p;:k;d}' file

Input File:

Stimulus 1...
bad
bad
bad
Stimulus 2...
...
...
...
Response 2
Stimulus 3...
...
...
...
Response 3
Stimulus 4...
bad
bad
bad
bad
Stimulus 5...
...
...
...
...
Response 5
bad
bad
bad
bad
Response 6
bad
bad
bad

And output:

$sed -n 'H;/^Stimulus/{h;d};/^Response/{x;s/^Response//;tk;p;:k;d}' file
Stimulus 2...
...
...
...
Response 2
Stimulus 3...
...
...
...
Response 3
Stimulus 5...
...
...
...
...
Response 5

And my code for GNU awk:

awk '{a[++i]=$0};/^Response/ && a[1] !~ /^Response/ {for (k=1; k<=i; k++) {print a[k]}};/^Stimulus|^Response/ { delete a; i=0; a[++i]=$0}' file

As you can see, I need too much awk code ...

Upvotes: 4

Birei

Reputation: 36262

One dirty way, although it seemed to work in my test, could be to reverse the file content, search from Response to Stimulus and reverse again the result.

Assuming following input data:

Stimulus 1...
...
...
...
Stimulus 2...
...
...
...
Response 2
Stimulus 3...
...
...
...
Response 3
Stimulus 4...
...
...
...
Stimulus 5...

The command:

tac infile | sed -ne '/Response/,/Stimulus/ p' | tac -

Yields:

Stimulus 2...
...
...
...
Response 2
Stimulus 3...
...
...
...
Response 3

EDIT: For an example with isolated Response parts. There is to filter twice (based on a comment of the OP):

tac infile | 
  sed -ne '/Response/,/Stimulus/ p' | 
  tac - | 
  sed -ne '/Stimulus/,/Response/ p'

Upvotes: 7

jaypal singh

Reputation: 77105

Updated to handle isolated Responses

awk '
/Response/ { 
    if (p==1) {
        for(;k<length(a);) {
            print a[++k]
        }
        print $0
    }
    delete a;k=p=0
} 
/Stimulus/ {
    if (p==1) {
        delete a; i=0
    }
    p=1
} 
p { a[++i]=$0 }' log

Upvotes: 4

gniourf_gniourf

Reputation: 46833

Here's a pure bash solution that tries to minimize stupid side effects:

#!/bin/bash

out=()

while read -r l; do
   case "$l" in
       Stimulus*) out=( "$l" ) ;;
       Response*) ((${#out[@]}!=0)) && { printf "%s\n" "${out[@]}" "$l"; out=(); } ;;
       *) ((${#out[@]}!=0)) && out+=( "$l" ) ;;
   esac
done < infile

It also handles the case where there's a Response but no Stimulus.

Upvotes: 4

Birei

Reputation: 36262

Other option is switch to perl and its flip-flop (range operator):

perl -lne '
    BEGIN {
        ## Create regular expression to match the initial and final words.
        ($from_re, $to_re) = map { qr/\A$_/ } qw|Stimulus Response|;
    }
    ## Range, similar to "sed".
    if ( $r = ( m/$from_re/o ... m/$to_re/o ) ) {
        ## If inside the range and found the initial word again, remove 
        ## all lines saved.
        if ( $r > 1 && m/$from_re/o ) {
            @data = ();
        }
        ## Save line.
        push @data, $_;
        ## At the end of the range, print all lines saved.
        if ( $r =~ m/E0\z/ ) {
            printf qq|%s\n|, join qq|\n|, @data;
            @data = ();
        }
    }
' infile

Assuming an input file as:

Stimulus 1...
...
...
...
Stimulus 2...
...
...
...
Response 2
Stimulus 3...
...
...
...
Response 3
Stimulus 4...
...
...
...
Stimulus 5...

It yields:

Stimulus 2...
...
...
...
Response 2
Stimulus 3...
...
...
...
Response 3

Upvotes: 4

get specific lines from a repeated range pattern in a text file

Answers (6)

Really nice & easy job for GNU sed, one-way, no unwanted pipes & tools:

Related Questions