Delete rows according to condition

Question

Using as key columns 1 and 2, i want to delete all rows which the value increments by one.

input

1000 1001 140
1000 1002 140
1000 1003 140
1000 1004 140
1000 1005 140
1000 1006 140
1000 1201 140
1000 1202 140
1000 1203 140
1000 1204 140
1000 1205 140
2000 1002 140
2000 1003 140
2000 1004 140
2000 1005 140
2000 1006 140

output desired

1000 1001 140
1000 1006 140
1000 1201 140
1000 1205 140
2000 1002 140
2000 1006 140

I have tried

awk '{if (a[$1] < $2)a[$1]=$2;}END{for(i in a){print i,a[i];}}'

But for some reason, it keeps only the maximum value.

James K. Lowden · Accepted Answer

Your problem statement doesn't describe your output. You want to print the first and last row of each contiguous range. Like this:

$ awk '$1 > A || $2 > B + 1 {
   if(row){print row}; print} 
   {A=$1; B=$2; row=$0} 
   END {print}' dat

1000 1001 140
1000 1006 140
1000 1201 140
1000 1205 140
2000 1002 140
2000 1006 140

The basic problem is just to determine if a line is only 1 more than the prior one. The only way to do that is to have both lines to compare. By storing the value of each line as it's read, you can compare the current line to the prior one.

Delete rows according to condition

Answers (1)

Related Questions