Reputation: 93
I have a bunch of text files that needs cleaning. Im using UNIX bash, so AWK or grep is good.
The text files looking something like this:
1766 1789
1764 1790
1762 1849
0
1357 1817
1366 1857
0
360 42
352 95
0
293 142
302 181
delete-this
0
302 181
0
What I want is to delete all rows with "0", "delete-this", only one row with two columns or three rows with two columns.
The result should look like this:
1766 1789
1762 1849
1357 1817
1366 1857
360 42
352 95
293 142
302 181
Thanks a lot!
More info: The sum of row 1 column 2 and row 2 column 2 should be >1, if not, row 2 must be deleted.
Upvotes: 0
Views: 423
Reputation: 41446
This was a hard nut, or difficult to understand, but here we go again:
awk '/[0-9]+ [0-9]+/ {a[++t]=$0;b[t]=$2;next} {if (t>=2) for (i=1;i<=t;i++) {if (b[i]-c!=1) print a[i];c=b[i]};t=0}'
1766 1789
1762 1849
1357 1817
1366 1857
360 42
352 95
293 142
302 181
How does it work:
awk '
/[0-9]+ [0-9]+/ { # if line does have 2 column of number, then
a[++t]=$0 # add line to array "a" and increment variable "t"
b[t]=$2 # add column 2 to array "b"
next # go to next line
}
{
if (t>=2) # is there more two or more lines with numbers connrected, then
for (i=1;i<=t;i++) { # loop trough array "a" with all numbers
if (b[i]-c!=1) # test if the difference between this number in column 2 is more than 1 compare to previous line
print a[i] # then print array "a"
c=b[i] # store array "b" information in variable "b"
}
;t=0 # clear counter "t"
}' file
Upvotes: 2