Compare curent line and next line in awk

Question

I want to find the pattern like: the column 2 is 'C' in a current line and the column 2 in a next line is 'G'. And the column 4 of file is 'CG'. I want to compare 1st to 2nd, 3rd to 4th, 5th to 6th, so on. Then print a couple of current line and next line. The 'C' can appear in both even and odd line.

Input like this:

chr1    C   10467   CHH CT  0.0 0   1
chr1    C   10469   CG  CG  0.0 0   1
chr1    G   10470   CG  CG  0.0 0   8
chr1    C   10471   CG  CG  0.0 0   1
chr1    G   10472   CG  CG  1.0 8   8

Expected Output is, separated by tab-delimiter:

chr1    C   10469   CG  CG  0.0 0   1
chr1    G   10470   CG  CG  0.0 0   8
chr1    C   10471   CG  CG  0.0 0   1
chr1    G   10472   CG  CG  1.0 8   8

My code is:

awk '{a=$2; c=$4; d=$0; e=NR; getline; f=$2; g=$4} {if (a == "C" && f == "G" && c == "CG" && g == "CG") {print d,e,"
",$0,NR}}' input_file

I use getline and check if there is 'G' on a next line. The problem is, if I do that, awk will then directly go to the third line, and will miss some lines. For example, the input's column 2 is:

Line 1: G
Line 2: C
Line 3: G
Line 4: C

The expected output is Line 2 and Line 3. However, awk directly went to third line from the first line, not line by line. So, the output is none.

Kind regards!

James Brown · Accepted Answer

Man, I understood that completely wrong first. I hope I got it right this time.

$ awk '
$2=="G" && $4=="CG" && p2=="C" && p4=="CG" {
    print p ORS $0
}
{
    p=$0
    p2=$2
    p4=$4
}' file

Output:

chr1    C   10469   CG  CG  0.0 0   1
chr1    G   10470   CG  CG  0.0 0   8 
chr1    C   10471   CG  CG  0.0 0   1
chr1    G   10472   CG  CG  1.0 8   8

Explained:

awk '
$2=="G" &&            # the column 2 in current line is G
$4=="CG" &&           # And the column 4 of file is CG
p2=="C" &&            # the column 2 is C in a previous line
p4=="CG" {            # And the column 4 of file is CG
    print p ORS $0    # Then print a couple of current line and next line
}
{
    p=$0              # current record is previous on next round
    p2=$2             # same goes for column 2
    p4=$4             # and column 4
}' file

Compare curent line and next line in awk

Answers (2)

Related Questions