Reputation: 1138
Let's say I have some numerical data in columns, something like
11.100000 36.829657 6.101642
11.400000 36.402069 5.731998
11.700000 35.953025 5.372652
12.000000 35.482082 5.023737
12.300000 34.988528 4.685519
12.600000 34.471490 4.358360
12.900000 33.930061 4.042693
13.200000 33.363428 3.738985
13.500000 32.770990 3.447709
13.800000 32.152473 3.169312
I also have a single target value and a column index. Given this set of data, I want to find the closest value to the target value in the column with the specified index.
For example, If my target value is 11.6
in column 1
, then the script should output 11.7
. If there are two numbers equidistant from the target value, then the higher value should be output.
I have a feeling that awk has the necessary functionality to do this, but any solution that works in a bash script is welcome.
Upvotes: 3
Views: 5113
Reputation: 195039
try this:
awk -v c=2 -v t=35 'NR==1{d=$c-t;d=d<0?-d:d;v=$c;next}{m=$c-t;m=m<0?-m:m}m<d{d=m;v=$c}END{print v}' file
the -v c=2
and -v t=35
could be dynamic value. they are the column idx (c
) and your target value (t
). in the above line, the parameter is column 2 and target 25. They could be shell variable.
the output of above line based on given input data is:
kent$ awk -v c=2 -v t=35 'NR==1{d=$c-t;d=d<0?-d:d;v=$c;next}{m=$c-t;m=m<0?-m:m}m<d{d=m;v=$c}END{print v}' f
34.988528
kent$ awk -v c=1 -v t=11.6 'NR==1{d=$c-t;d=d<0?-d:d;v=$c;next}{m=$c-t;m=m<0?-m:m}m<d{d=m;v=$c}END{print v}' f
11.700000
EDIT
If there are two numbers equidistant from the target value, then the higher value should be output
The above codes didn't check this requirement.... the below one should work:
awk -v c=1 -v t=11.6 '{a[NR]=$c}END{
asort(a);d=a[NR]-t;d=d<0?-d:d;v = a[NR]
for(i=NR-1;i>=1;i--){
m=a[i]-t;m=m<0?-m:m
if(m<d){
d=m;v=a[i]
}
}
print v
}' file
test:
kent$ awk -v c=1 -v t=11.6 '{a[NR]=$c}END{
asort(a);d=a[NR]-t;d=d<0?-d:d;v = a[NR]
for(i=NR-1;i>=1;i--){
m=a[i]-t;m=m<0?-m:m
if(m<d){
d=m;v=a[i]
}
}
print v
}' f
11.700000
short explanation.
I won't explain each line of code, what it does. just tell a bit the idea to do the job.
v
, and calculate the diff between it and the given target, save it(absolute value) in d
d
, overwrite d
with diff, also save current element into v
some note:
d(abs)
, if new diff > d, we can stop the loop.O(nlogn)
. in fact this problem could be solved by O(n)
. If your input data were huge, and with a worst case(e.g. your column has value in range 500-99999999999
, but your target is 1.) you may want to avoid the sort. but I assume the performance is not an issue by you.Upvotes: 9
Reputation: 289555
Let's try another way, although Kent's answer must be shorter and sharper :)
awk -vc=1 -vv=13.6 '
BEGIN{l=$c; ld=99}
{d=($c-v>=0) ? ($c-v) : v-$c; if (d <= ld) {ld=d; l=$c}}
END{print l}' file
We provide the c
(=column) and v
(=value) parameters in the beginning.
Then we keep track of the lower value l
and the lowest distance ld
. For each value we calculate the distance d
to the value and if it is lower to the previous ld
, we swap and save the new minimal value in l
. Finally we print l
.
The d=($c-v>=0) ? ($c-v) : v-$c
is a way to save the distance as a absolute value: if c-v
is negative, save it as positive. It is based on the value=(condition) ? if yes : else
structure.
$ awk -vc=2 -vv=13.6 'BEGIN{l=$c; ld=99} {d=($c-v>=0) ? ($c-v) : v-$c; if (d <= ld) {ld=d; l=$c}} END{print l}' file
32.152473
$ awk -vc=3 -vv=10.6 'BEGIN{l=$c; ld=99} {d=($c-v>=0) ? ($c-v) : v-$c; if (d <= ld) {ld=d; l=$c}} END{print l}' file
3.169312
Upvotes: 2
Reputation: 241828
Perl solution:
#!/usr/bin/perl
use warnings;
use strict;
@ARGV == 2 or die "Usage: closest column value < input\n";
my ($column, $target) = (shift, shift);
my $closest;
while (<>) {
my $value = (split)[$column - 1];
if ($. == 1
or abs($closest - $target) > abs($target - $value)
or abs($closest - $target) == abs($target - $value)
&& $value > $closest) {
$closest = $value;
}
}
print $closest, "\n";
Note that using float == float might not work (What Every Computer Scientist Should Know About Floatin-Point Arithmetic). You might need something like abs(abs($closest - $target) - abs($target - $value)) < 1e-14
.
Upvotes: 1