Reputation: 391
Wow, this sounds so complicated in the title, but I assume it is not quite so.
I have text files that have basically this layout:
Stimulus ...
...
...
...
Response
Stimulus ...
...
...
...
Response
I used sed to get everything in between and then further extracted information I needed.
sed -n -e '/Stimulus/,/Response/ p'
However, sometimes the participants do not respond, in which case the file looks like this:
Stimulus ...
...
...
...
Stimulus ...
...
...
...
Response
In this special case, my script will not get what I am looking for. So, I am looking for a way to extract the information if and only if the pattern1 is followed by pattern2, not pattern1 again.
Let me know, if I formulated it unclear. I am more then happy to provide further information.
Upvotes: 6
Views: 196
Reputation: 7610
This is a pure bash solution:
tmp=()
while read l; do
[[ $l =~ ^Stimulus ]] && tmp=("$l") && continue
[ ${#tmp[@]} -eq 0 ] && continue
tmp+=("$l")
[[ $l =~ ^Response ]] && printf "%s\n" "${tmp[@]}" && tmp=()
done <infile
It starts to fill up the array tmp
if a list starting with Stimulus
found. If another Stimulus
arrives, it just clears tmp
and starts the job again. If Response
found, it prints the content of the tmp
array. Actually printf
built-in does an implicit loop.
Input:
cat >infile <<XXX
...
Response 0
...
Stimulus 1
...
Stimulus 2
...
Response 2
...
Stimulus 3
...
Response 3
...
Response 4
XXX
Output:
Stimulus 2
...
Response 2
Stimulus 3
...
Response 3
Upvotes: 5
Reputation: 3756
sed -n 'H;/^Stimulus/{h;d};/^Response/{x;s/^Response//;tk;p;:k;d}' file
Input File:
Stimulus 1... bad bad bad Stimulus 2... ... ... ... Response 2 Stimulus 3... ... ... ... Response 3 Stimulus 4... bad bad bad bad Stimulus 5... ... ... ... ... Response 5 bad bad bad bad Response 6 bad bad bad
And output:
$sed -n 'H;/^Stimulus/{h;d};/^Response/{x;s/^Response//;tk;p;:k;d}' file Stimulus 2... ... ... ... Response 2 Stimulus 3... ... ... ... Response 3 Stimulus 5... ... ... ... ... Response 5
And my code for GNU awk:
awk '{a[++i]=$0};/^Response/ && a[1] !~ /^Response/ {for (k=1; k<=i; k++) {print a[k]}};/^Stimulus|^Response/ { delete a; i=0; a[++i]=$0}' file
As you can see, I need too much awk code ...
Upvotes: 4
Reputation: 36262
One dirty way, although it seemed to work in my test, could be to reverse the file content, search from Response
to Stimulus
and reverse again the result.
Assuming following input data:
Stimulus 1...
...
...
...
Stimulus 2...
...
...
...
Response 2
Stimulus 3...
...
...
...
Response 3
Stimulus 4...
...
...
...
Stimulus 5...
The command:
tac infile | sed -ne '/Response/,/Stimulus/ p' | tac -
Yields:
Stimulus 2...
...
...
...
Response 2
Stimulus 3...
...
...
...
Response 3
EDIT: For an example with isolated Response
parts. There is to filter twice (based on a comment of the OP):
tac infile |
sed -ne '/Response/,/Stimulus/ p' |
tac - |
sed -ne '/Stimulus/,/Response/ p'
Upvotes: 7
Reputation: 77105
Updated to handle isolated Responses
awk '
/Response/ {
if (p==1) {
for(;k<length(a);) {
print a[++k]
}
print $0
}
delete a;k=p=0
}
/Stimulus/ {
if (p==1) {
delete a; i=0
}
p=1
}
p { a[++i]=$0 }' log
Upvotes: 4
Reputation: 46833
Here's a pure bash solution that tries to minimize stupid side effects:
#!/bin/bash
out=()
while read -r l; do
case "$l" in
Stimulus*) out=( "$l" ) ;;
Response*) ((${#out[@]}!=0)) && { printf "%s\n" "${out[@]}" "$l"; out=(); } ;;
*) ((${#out[@]}!=0)) && out+=( "$l" ) ;;
esac
done < infile
It also handles the case where there's a Response
but no Stimulus
.
Upvotes: 4
Reputation: 36262
Other option is switch to perl
and its flip-flop (range operator):
perl -lne '
BEGIN {
## Create regular expression to match the initial and final words.
($from_re, $to_re) = map { qr/\A$_/ } qw|Stimulus Response|;
}
## Range, similar to "sed".
if ( $r = ( m/$from_re/o ... m/$to_re/o ) ) {
## If inside the range and found the initial word again, remove
## all lines saved.
if ( $r > 1 && m/$from_re/o ) {
@data = ();
}
## Save line.
push @data, $_;
## At the end of the range, print all lines saved.
if ( $r =~ m/E0\z/ ) {
printf qq|%s\n|, join qq|\n|, @data;
@data = ();
}
}
' infile
Assuming an input file as:
Stimulus 1...
...
...
...
Stimulus 2...
...
...
...
Response 2
Stimulus 3...
...
...
...
Response 3
Stimulus 4...
...
...
...
Stimulus 5...
It yields:
Stimulus 2...
...
...
...
Response 2
Stimulus 3...
...
...
...
Response 3
Upvotes: 4