Reputation: 2157
I have a file containing blocks of 4 lines that belong together. Structure looks like this
@A1
ABCGKJTGE
+
A4
@B1
ACDFS
+
B4
@C1
SFDGDGDAD
+
C4
Now when the length of the string of the second line of each block, is not equal to 9, I want the block of 4 to be removed. In this case, the 'B - block' would be removed. So my outputfile would look like this
@A1
ABCGKJTGE
+
A4
@C1
SFDGDGDAD
+
C4
I would use 'awk' to do this, but not sure how in this case.
Upvotes: 0
Views: 1127
Reputation: 58617
Solution in TXR:
@(repeat)
@@@head
@{line2 9}
@line3
@line4
@ (output)
@@@head
@line2
@line3
@line4
@ (end)
@(end)
Run:
$ txr data.txr data @A1 ABCGKJTGE + A4 @C1 SFDGDGDAD + C4
Upvotes: 0
Reputation: 37424
Yet another AWK solution, inspired by a previous solution:
$ cat > yetanother.awk
{
a=a $0 ORS # thanks @Ed Morton
}
NR%4==0 { # for every 4th record
split(a,b,ORS) # split gathered a to b
if(length(b[2]==9)) # if the second record in block
printf "%s", a # print it
a="" # reset a
}
And testing it:
$ awk -f yetanother.awk structure.txt
@A1
ABCGKJTGE
+
A4
@C1
SFDGDGDAD
+
C4
Upvotes: 1
Reputation: 827
Using gnu awk multiline records
awk '
BEGIN{ RS="(^|\n)@[^\n]*\n" }
length($1) == 9 {printf("%s%s", prt, $0)}
{prt=RT}
'
Upvotes: 0
Reputation: 204015
$ cat tst.awk
NR%4 == 2 { lgth = length() }
{ rec = rec $0 ORS }
NR%4 == 0 {
if ( lgth == 9 ) {
printf "%s", rec
}
rec = ""
}
$ awk -f tst.awk file
@A1
ABCGKJTGE
+
A4
@C1
SFDGDGDAD
+
C4
Upvotes: 1
Reputation: 60143
Here's a sed
solution:
sed -E 'N;N;N;/.*\n[^\n]{9}\n.*\n/ !d' test.txt
(Depending on your OS, -E
may need to be -r
instead.)
This should be read as "When you find a line, read three more lines (giving us four total), look for a second line that's exactly 9 characters long, and if not found, delete all four lines."
For a test.txt
with this content:
@A1
ABCGKJTGE
+
A4
@B1
ACDFS
+
B4
@C1
SFDGDGDAD
+
C4
The output is:
@A1
ABCGKJTGE
+
A4
@C1
SFDGDGDAD
+
C4
Upvotes: 1
Reputation: 37298
awk '{
arr[NR%4]=$0
#dbg print "dbg: NR%4=" NR%4 "\tarr[2]="arr[2]"\tlen="length(arr[2])
if (NR%4==0 && (length(arr[2]) == 9)) {
print arr[1]"\n"arr[2]"\n"arr[3]"\n"arr[0]
}
} ' data
output
@A1
ABCGKJTGE
+
A4
@C1
SFDGDGDAD
+
The one tricky part here is that arr[NR%4]
references arr[0]
. So we have to change from the "logical" arr[4]
to arr[0]
in the print
statement.
If you have more than 4 lines per "record" you can parameterize that value, and then use it to drive a for
loop to print the saved record, i.e.
for (i=1; i<=recSize; i++) {
print arr[i]
}
print arr[0]
IHTH
Upvotes: 0
Reputation: 195189
This awk cmd does the job:
awk '{a[NR]=$0}
END{for(i=2;i<=NR;i+=4)
if(length(a[i])==9)
p[i-1]=p[i]=p[i+1]=p[i+2]=1
for(x=1;x<=NR;x++)
if(p[x])print a[x]}' file
The idea is save all lines in an array, and check the interesting line, and decide if the "block" should be printed or not.
test with your example:
kent$ cat f
A1
NNNNNNNNN
A3
A4
B1
NNNNNNN
B3
B4
C1
NNNNNNNNN
C3
C4
kent$ awk '{a[NR]=$0}
END{for(i=2;i<=NR;i+=4)
if(length(a[i])==9)
p[i-1]=p[i]=p[i+1]=p[i+2]=1
for(x=1;x<=NR;x++)
if(p[x])print a[x]}' f
A1
NNNNNNNNN
A3
A4
C1
NNNNNNNNN
C3
C4
Upvotes: 1