Reputation: 2157

remove lines when string has certain length with awk or sed

I have a file containing blocks of 4 lines that belong together. Structure looks like this

@A1
ABCGKJTGE
+
A4
@B1
ACDFS
+
B4
@C1
SFDGDGDAD
+
C4

Now when the length of the string of the second line of each block, is not equal to 9, I want the block of 4 to be removed. In this case, the 'B - block' would be removed. So my outputfile would look like this

@A1
ABCGKJTGE
+
A4
@C1
SFDGDGDAD
+
C4

I would use 'awk' to do this, but not sure how in this case.

Upvotes: 0

Answers (7)

Kaz

Reputation: 58617

Solution in TXR:

@(repeat)
@@@head
@{line2 9}
@line3
@line4
@  (output)
@@@head
@line2
@line3
@line4
@  (end)
@(end)

Run:

$ txr data.txr data
@A1
ABCGKJTGE
+
A4
@C1
SFDGDGDAD
+
C4

Upvotes: 0

James Brown

Reputation: 37424

Yet another AWK solution, inspired by a previous solution:

$ cat > yetanother.awk
{
    a=a $0 ORS                       # thanks @Ed Morton
}
NR%4==0 {                            # for every 4th record
    split(a,b,ORS)                   # split gathered a to b
    if(length(b[2]==9))              # if the second record in block
        printf "%s", a                # print it
    a=""                             # reset a
}

And testing it:

$ awk -f yetanother.awk structure.txt
@A1
ABCGKJTGE
+
A4
@C1
SFDGDGDAD
+
C4

Upvotes: 1

pakistanprogrammerclub

Reputation: 827

Using gnu awk multiline records

awk '
BEGIN{ RS="(^|\n)@[^\n]*\n" }
length($1) == 9 {printf("%s%s", prt, $0)}
{prt=RT}
'

Upvotes: 0

Ed Morton

Reputation: 204015

$ cat tst.awk
NR%4 == 2 { lgth = length() }
{ rec = rec $0 ORS }
NR%4 == 0 {
    if ( lgth == 9 ) {
        printf "%s", rec
    }
    rec = ""
}

$ awk -f tst.awk file
@A1
ABCGKJTGE
+
A4
@C1
SFDGDGDAD
+
C4

Upvotes: 1

user94559

Reputation: 60143

Here's a sed solution:

sed -E 'N;N;N;/.*\n[^\n]{9}\n.*\n/ !d' test.txt

(Depending on your OS, -E may need to be -r instead.)

This should be read as "When you find a line, read three more lines (giving us four total), look for a second line that's exactly 9 characters long, and if not found, delete all four lines."

For a test.txt with this content:

@A1
ABCGKJTGE
+
A4
@B1
ACDFS
+
B4
@C1
SFDGDGDAD
+
C4

The output is:

@A1
ABCGKJTGE
+
A4
@C1
SFDGDGDAD
+
C4

Upvotes: 1

shellter

Reputation: 37298

awk '{
        arr[NR%4]=$0
        #dbg print "dbg: NR%4=" NR%4 "\tarr[2]="arr[2]"\tlen="length(arr[2])
        if (NR%4==0 && (length(arr[2]) == 9)) {
                print arr[1]"\n"arr[2]"\n"arr[3]"\n"arr[0]
        }
} ' data

output

@A1
ABCGKJTGE
+
A4
@C1
SFDGDGDAD
+

The one tricky part here is that arr[NR%4] references arr[0]. So we have to change from the "logical" arr[4] to arr[0] in the print statement.

If you have more than 4 lines per "record" you can parameterize that value, and then use it to drive a for loop to print the saved record, i.e.

 for (i=1; i<=recSize; i++) {
   print arr[i]
 }
 print arr[0]

IHTH

Upvotes: 0

Kent

Reputation: 195189

This awk cmd does the job:

 awk '{a[NR]=$0}
    END{for(i=2;i<=NR;i+=4)
            if(length(a[i])==9)
                p[i-1]=p[i]=p[i+1]=p[i+2]=1
        for(x=1;x<=NR;x++)
                if(p[x])print a[x]}' file

The idea is save all lines in an array, and check the interesting line, and decide if the "block" should be printed or not.

test with your example:

kent$  cat f
A1
NNNNNNNNN
A3
A4
B1
NNNNNNN
B3
B4
C1
NNNNNNNNN
C3
C4

kent$  awk '{a[NR]=$0}
        END{for(i=2;i<=NR;i+=4)
                        if(length(a[i])==9)
                                p[i-1]=p[i]=p[i+1]=p[i+2]=1
                for(x=1;x<=NR;x++)
                        if(p[x])print a[x]}' f
A1
NNNNNNNNN
A3
A4
C1
NNNNNNNNN
C3
C4

Upvotes: 1

remove lines when string has certain length with awk or sed

Answers (7)

Related Questions