Reputation: 340
I have a list of ranges, and I am trying to merge subsequent entries which lie within a given distance of each other.
In my data, the first column contains the lower bound of the range and the second column contains the upper bound.
The logic follows: if the value in column 1 is less than or equal to the value in column 2 of the previous row plus a given value, print the entry in column 1 of the previous row and the entry in column 2 of the given row.
If the two ranges lie within the distance specified by the variable 'dist', they should be merged, else the rows should be printed as they are.
Input:
1 10
9 19
51 60
if dist=10, Desired output:
1 19
51 60
Using bash, I've tried things along these lines:
dist=10
awk '$1 -le (p + ${dist}) { print q, $2 } {p=$2;} {q=$1} ' input.txt > output.txt
This returns syntax errors.
Any help appreciated!
Upvotes: 1
Views: 116
Reputation: 3089
Assuming, if the condition is satisfied for 2 pairs of consecutive records (i.e 3 records in total, consecutively) then 3rd one would consider the output of rec-1 and rec-2 as it's previous record.
awk -v dist=10 'FNR==1{prev_1=$1; prev_2=$2; next} ($1<=prev_2+dist){print prev_1,$2; prev_2=$2;next} {prev_1=$1; prev_2=$2}1' file
Input :
$cat file
1 10
9 19
10 30
51 60
Output:
1 19
1 30
51 60
Upvotes: 1