Reputation: 18229

Read from file all lines that have an index `n` lower than the lines that match a given regex

I would like to read from the file file.txt all lines that have an index n lower than the lines that match a given regex regex. For example the file

hello my friend
foo
_bar_
I love this bar
poof
kouki
splash in the water
bar

if regex=bar and n=2, then we want to read

hello my friend
foo
kouki

I found my way through this problem with the cumbersome one liner

sed -n `grep -n bar file.txt | awk -F ":" '{print ($1 - 2)}' | tr '\n' 'X'
| sed 's+X+p;+g' | sed 's/.$//'` < file.txt

Is there a better (faster, easier to read) solution?

(My goal with this question is purely educational)

Upvotes: 2

Answers (3)

dawg

Reputation: 104032

With awk:

$ awk '/bar/ && FNR>2 {print li[-2]}
       {li[-2]=li[-1]; li[-1]=$0}' file
hello my friend
foo
kouki

Which can be made more general to print the n^th line before the match (without having to have the entire file in memory):

$ awk -v n=3 '/bar/ && FNR>n{ print li[n]}
              {for (i=n;i>1;i--) 
                    li[i]=li[i-1]
               li[1]=$0}' file
hello my friend
poof

Upvotes: 5

RomanPerekhrest

Reputation: 92874

Short sed approach:

sed -n '1N;2N;/bar[^\n]*$/P;N;D' file.txt

The output:

hello my friend
foo
kouki

Details:

1N;2N; - reads the first 3 lines into the pattern space
/bar[^\n]*$/ - checks if the last line matches bar. ([^\n]*$ - ensures that it's the last line of the captured 3 lines section)
P; - if the above match is found print the 1st line of the pattern space
N - adds a newline to the pattern space, then append the next line of input to the pattern space
D - delete text in the pattern space up to the first newline, and restart cycle with the resultant pattern space (i.e. regarding the first 3 lines - the 1st line hello my friend will be printed and deleted from the pattern space and new cycle will be started at next line foo)

Upvotes: 4

F. Hauri - Give Up GitHub

Reputation: 70922

Pure bash

o=0 a=()
while read -r line;do
    a+=("${line}")
    [ "$line" ] && [ -z "${line//*bar*}" ] && echo ${a[o-2]}
    ((o++))
  done <file.txt
hello my friend
foo
kouki

Or, because you're speaking about regex:

while read -r line;do
    a+=("${line}")
    [[ ${line}  =~ bar ]] && echo ${a[o-2]}
    ((o++))
  done <file.txt

But, for performances, I prefer 1st syntax...

As a function

grepIndex () { 
    local o=0 a=() line
    while read -r line; do
        a+=("${line}")
        [ "$line" ] && [ -z "${line//*$1*}" ] && echo ${a[o-$2]}
        ((o++))
    done
}

grepIndex <file.txt bar 2
hello my friend
foo
kouki

Wich could be written

grepIndex() {
    local o=0 a=() line
    while read -r line;do
        a+=("${line}")
        [[ ${line} =~ $1 ]] && echo ${a[o-$2]}
        ((o++))
    done
}

too.

Nota:

If pure bash is a lot quicker on small files, for big files, bash become overkill!! Have a look at RomanPerekhrest's answer! Using sed could be one of the most efficient solution for doing this (on big files)!