Reputation: 57
I've been passing hour to search on the net but i can't find a solution to a problem that looks just so easy...
i have a file with multiple pattern match
----PATERN1----
textaa1
textbb1
textcc1
.......
----PATERN2----
----PATERN1----
textaa2
textbb2
textcc2
.......
----PATERN2----
----PATERN1----
textaa2
textbb2
textcc2
.......
----PATERN2----
etc...
This is the output i get with the command
sed -n '/PATERN1/,/PATERN2/p' file
But the question is how can i chose only the nth occurence ? (1 - 2- 3 etc.. that i can then replace with a variable) Thanks in advance
Upvotes: 0
Views: 148
Reputation: 58361
This might work for you (GNU sed):
sed -nr '/PATTERN1/H;//,/PATTERN2/G;/\n(\n[^\n]+){2}$/P' file
Used the hold space as a counter and print only those lines that match the required number i.e. in the above that number is 2.
N.B. Assumes that PATTERN1
and PATTERN2
are matched throughout the file.
Upvotes: 1
Reputation: 133428
Could you please try following awk too, you could provide number of occurrences into it too.
awk -v occur=2 -v regex1="PATERN1" -v regex2="PATERN2" '(occur * 2)==count{print val;val=""} $0 ~ regex1{count++} $0 ~ regex2{count++} {val=val?val ORS $0:$0}' Input_file
Adding a non-one liner form of solution too here.
awk -v occur=2 -v regex1="PATERN1" -v regex2="PATERN2" '
(occur * 2)==count{
print val;
val=""
}
$0 ~ regex1{
count++
}
$0 ~ regex2{
count++
}
{
val=val?val ORS $0:$0
}
' Input_file
Solution 2nd: In case your Input_file is broken and don't have a sequence of PATTERN1 following with PATTERN2 in each block then following may help you too on same.
awk -v occur=2 -v regex1="PATERN1" -v regex2="PATERN2" '
$0 ~ regex1 && flag{
val=""
}
(occur * 2)==count{
print val
val=""
count=""
}
{
val=val?val ORS $0:$0
}
$0 ~ regex1{
count++
flag=1
}
$0 ~ regex2 && count{
count++
flag=""
}
END{
if((occur * 2)==count){
print val
}
}
' Input_file
PS: Here I am considering that occurrences means from Pattern1 to Pattern2 is 1 occurrence.
PS for 2nd solution: In case you do not want to print anything if number of occurrences(for matched strings/regex/patterns) NOT found mentioned by user then add count=""
in $0 ~ regex1 && flag{
block too.
Upvotes: 0
Reputation: 203169
It IS so easy but you're trying to use the wrong tool. sed is for s/old/new/
, that is all and for anything else such as you're doing you should be using awk instead.
$ awk -v n=2 '
/PATERN1/ {f=1; rec=""}
f {
rec = rec $0 ORS
if (/PATERN2/) {
if (++c == n) {
printf "%s", rec
}
f=0
}
}' file
----PATERN1----
textaa2
textbb2
textcc2
.......
----PATERN2----
Note that the above will work in any awk in any shell in any UNIX system and with the above you don't need to test for ether PATERN multiple times, if you want to choose a different record number to print you just change the value of n
on the command line, if you want to print multiple records by their numbers its a trivial, obvious tweak:
$ awk -v n=2 -v m=7 '
/PATERN1/ {f=1; rec=""}
f {
rec = rec $0 ORS
if (/PATERN2/) {
if ( (++c == n) || (c == m) ) {
printf "%s", rec
}
f=0
}
}' file
If you want to test for specific text "foo" within the block instead of (or in addition to) testing a number it's also trivial and obvious:
$ awk '
/PATERN1/ {f=1; rec=""}
f {
rec = rec $0 ORS
if (/PATERN2/) {
if (rec ~ /foo/) {
printf "%s", rec
}
f=0
}
}' file
If you want to print specific lines within each block or remove newlines or anything else at all it's also trivial and obvious because the above is using the right tool for the job.
Upvotes: 1