Reputation: 57

sed print text between 2 pattern and select n occurence

I've been passing hour to search on the net but i can't find a solution to a problem that looks just so easy...

i have a file with multiple pattern match

----PATERN1----
textaa1
textbb1
textcc1
.......
----PATERN2----
----PATERN1----
textaa2
textbb2
textcc2
.......
----PATERN2----
----PATERN1----
textaa2
textbb2
textcc2
.......
----PATERN2----
etc...

This is the output i get with the command

sed -n '/PATERN1/,/PATERN2/p' file

But the question is how can i chose only the nth occurence ? (1 - 2- 3 etc.. that i can then replace with a variable) Thanks in advance

Upvotes: 0

Answers (3)

potong

Reputation: 58361

This might work for you (GNU sed):

sed -nr '/PATTERN1/H;//,/PATTERN2/G;/\n(\n[^\n]+){2}$/P' file

Used the hold space as a counter and print only those lines that match the required number i.e. in the above that number is 2.

N.B. Assumes that PATTERN1 and PATTERN2 are matched throughout the file.

Upvotes: 1

RavinderSingh13

Reputation: 133428

Could you please try following awk too, you could provide number of occurrences into it too.

 awk -v occur=2 -v regex1="PATERN1" -v regex2="PATERN2" '(occur * 2)==count{print val;val=""} $0 ~ regex1{count++} $0 ~ regex2{count++} {val=val?val ORS $0:$0}'  Input_file

Adding a non-one liner form of solution too here.

awk -v occur=2 -v regex1="PATERN1" -v regex2="PATERN2" '
(occur * 2)==count{
  print val;
  val=""
}
$0 ~ regex1{
  count++
}
$0 ~ regex2{
  count++
}
{
  val=val?val ORS $0:$0
}
'   Input_file

Solution 2nd: In case your Input_file is broken and don't have a sequence of PATTERN1 following with PATTERN2 in each block then following may help you too on same.

awk -v occur=2 -v regex1="PATERN1" -v regex2="PATERN2" '
$0 ~ regex1 && flag{
  val=""
}
(occur * 2)==count{
  print val
  val=""
  count=""
}
{
  val=val?val ORS $0:$0
}
$0 ~ regex1{
  count++
  flag=1
}
$0 ~ regex2 && count{
  count++
  flag=""
}
END{
  if((occur * 2)==count){
    print val
}
}
'  Input_file

PS: Here I am considering that occurrences means from Pattern1 to Pattern2 is 1 occurrence.

PS for 2nd solution: In case you do not want to print anything if number of occurrences(for matched strings/regex/patterns) NOT found mentioned by user then add count="" in $0 ~ regex1 && flag{ block too.

Upvotes: 0

Ed Morton

Reputation: 203169

It IS so easy but you're trying to use the wrong tool. sed is for s/old/new/, that is all and for anything else such as you're doing you should be using awk instead.

$ awk -v n=2 '
    /PATERN1/ {f=1; rec=""}
    f {
        rec = rec $0 ORS
        if (/PATERN2/) {
            if (++c == n) {
                printf "%s", rec
            }
            f=0
        }
    }' file
----PATERN1----
textaa2
textbb2
textcc2
.......
----PATERN2----

Note that the above will work in any awk in any shell in any UNIX system and with the above you don't need to test for ether PATERN multiple times, if you want to choose a different record number to print you just change the value of n on the command line, if you want to print multiple records by their numbers its a trivial, obvious tweak:

$ awk -v n=2 -v m=7 '
    /PATERN1/ {f=1; rec=""}
    f {
        rec = rec $0 ORS
        if (/PATERN2/) {
            if ( (++c == n) || (c == m) ) {
                printf "%s", rec
            }
            f=0
        }
    }' file

If you want to test for specific text "foo" within the block instead of (or in addition to) testing a number it's also trivial and obvious:

$ awk '
    /PATERN1/ {f=1; rec=""}
    f {
        rec = rec $0 ORS
        if (/PATERN2/) {
            if (rec ~ /foo/) {
                printf "%s", rec
            }
            f=0
        }
    }' file

If you want to print specific lines within each block or remove newlines or anything else at all it's also trivial and obvious because the above is using the right tool for the job.

Upvotes: 1

sed print text between 2 pattern and select n occurence

Answers (3)

Related Questions