user1739261
user1739261

Reputation: 359

awk distance between records

Hey I'm trying to find the distance between records in a text file. I'm trying to do it using awk. An example input is:

1 2 1 4 yes
2 3 2 2 no
1 1 1 5 yes
4 2 4 0 no
5 1 0 1 no

I want to find the distance between each of the numerical values. I'm doing this by subtracting the values and then squaring the answer. I have tried the following code below but all the distances are simply 0. Any help would be appreciated.

BEGIN {recs = 0; fieldnum = 5;}
{
  recs++;
    for(i=1;i<=NF;i++) {data[recs,i] = $i;}
}
END {
  for(r=1;r<=recs;r++) {
    for(f=1;f<fieldnum;f++) {
        ##find distances
        for(t=1;t<=recs;t++) {
        distance[r,t]+=((data[r,f] - data[t,f])*(data[r,f] - data[t,f]));
            }
        }
    }
      for(r=1;r<=recs;r++) {
        for(t=1;t<recs;t++) {
        ##print distances
        printf("distance between %d and %d is %d \n",r,t,distance[r,t]);
        }
        }
    }

Upvotes: 0

Views: 155

Answers (1)

Ed Morton
Ed Morton

Reputation: 203995

No idea what you mean conceptually by the "distance between each of the numerical values" so I can't help you with your algorithm but let's clean up the code to see what that looks like:

$ cat tst.awk
{
   for(i=1;i<=NF;i++) {
      data[NR,i] = $i
   }
}
END {
  for(r=1;r<=NR;r++) {
    for(f=1;f<NF;f++) {
        ##find distances
        for(t=1;t<=NR;t++) {
            delta = data[r,f] - data[t,f]
            distance[r,t]+=(delta * delta)
        }
     }
  }
  for(r=1;r<=NR;r++) {
     for(t=1;t<NR;t++) {
        ##print distances
        printf "distance between %d and %d is %d\n",r,t,distance[r,t]
     }
  }
}
$
$ awk -f tst.awk file
distance between 1 and 1 is 0
distance between 1 and 2 is 7
distance between 1 and 3 is 2
distance between 1 and 4 is 34
distance between 2 and 1 is 7
distance between 2 and 2 is 0
distance between 2 and 3 is 15
distance between 2 and 4 is 13
distance between 3 and 1 is 2
distance between 3 and 2 is 15
distance between 3 and 3 is 0
distance between 3 and 4 is 44
distance between 4 and 1 is 34
distance between 4 and 2 is 13
distance between 4 and 3 is 44
distance between 4 and 4 is 0
distance between 5 and 1 is 27
distance between 5 and 2 is 18
distance between 5 and 3 is 33
distance between 5 and 4 is 19

Seems to produce some non-zero output....

Upvotes: 3

Related Questions