Reputation: 765
If I have one dimension data such as
v1 =: 5 $ i.5
v1
0 1 2 3 4
then v1 -"1 0 v1
gives me the Euclidean vectors like
v1 -"1 0 v1
0 1 2 3 4
_1 0 1 2 3
_2 _1 0 1 2
_3 _2 _1 0 1
_4 _3 _2 _1 0
We can also find Euclidean distance matrix easily.
This is how I find the Euclidean vectors and distance matrix of a 2D vector
v2=: <"1 ? 5 2 $ 10
v2
┌───┬───┬───┬───┬───┐
│4 0│4 5│5 7│8 3│6 0│
└───┴───┴───┴───┴───┘
direction_vector=: <"1 @: (-"0 @:(-/"2 @: (>"0 @: (diff))))
distance =: +/"1 @: *: @: (>"2 @:(direction_vector))
m =: 3 : '(i.(#y)) distance"0 _ y'
m v2
0 25 50 25 4
25 0 5 20 29
50 5 0 25 50
25 20 25 0 13
4 29 50 13 0
However, my problem is that I am not sure how to find the Euclidean vectors and distance of 2D data in a smart and clean way
As you can see from the table below, my algorithm took more than 1/3 of time to calculate the direction vectors of data. 14.5 second is not bad, but the problem arises when I have a bigger dataset.
Time (seconds)
+----------------+------+--------+---------+-----+----+------+
|name |locale|all |here |here%|cum%|rep |
+----------------+------+--------+---------+-----+----+------+
|direction_vector|base |6.239582| 5.105430| 35.2| 35 |773040|
|move |base |9.741510| 1.753868| 12.1| 47 | 3390|
|script |base |1.969949| 1.443148| 9.9| 57 | 18|
|distance |base |5.650358| 1.318022| 9.1| 66 |579780|
|enclosestrings |pcsv |1.491832| 1.255603| 8.6| 75 | 1|
|diff |base |1.134585| 1.134585| 7.8| 83 |773186|
|makedsv |pcsv |1.728721| 0.236883| 1.6| 84 | 4|
|norm |base |0.221794| 0.221794| 1.5| 86 | 3390|
|xpt |base |0.194896| 0.194896| 1.3| 87 | 3390|
|ypt |base |0.193579| 0.193579| 1.3| 89 | 3390|
|writedsv |pcsv |2.067408| 0.186687| 1.3| 90 | 4|
|cosd |base |0.172359| 0.172359| 1.2| 91 | 113|
|[rest] | | | 1.300733| 9.0|100 | |
|[total] | | |14.517587|100.0|100 | |
+----------------+------+--------+---------+-----+----+------+
I think I can definitely simplify direction_vector one by using rank, but I got stuck. I tried "2 1
"1 1
"1 2
"_ 1
"1 0
..., but none of them gave me a clear result.
Can anyone help me on this issue? Thank you!
Upvotes: 1
Views: 102
Reputation: 4302
New answer so that I could format the result properly.
Looking at your result I think that there is further room for improvement at least by my measurements. Getting away from the tacit approach and moving to a sequential application with the change of +/ to +/"1 gives roughly twice the speed.
[ v2=: 5 2 $ 4 0 4 5 5 7 8 3 6 0
4 0
4 5
5 7
8 3
6 0
(10000) 6!:2 '+/@:*:@:-"1/~ v2'
8.4647e_6
(10000) 6!:2 '+/"1 *: -"1/~ v2'
3.1289e_6
(+/"1 *: -"1/~ v2) -: (+/@:*:@:-"1/~ v2)
1
+/"1 *: -"1/~ v2
0 25 50 25 4
25 0 5 20 29
50 5 0 25 50
25 20 25 0 13
4 29 50 13 0
Upvotes: 1
Reputation: 4302
I would begin with noticing that v2 u/ v2
makes a table out of the items of v2
by applying u
between those items. Also, you can simplify this a bit by using u/~ v2
which is the same as v2 u/ v2
. The next question is what is u
, but before we go there, boxing really slows things down and you don't actually have to box the vector for this to work since the items can already be written like this:
[ v2=: 5 2 $ 4 0 4 5 5 7 8 3 6 0
4 0
4 5
5 7
8 3
6 0
and this makes the items the vectors, which is what you want to be able to use u/~ v2
Now, back to the question of what we want u
to be. We are going to be working on the items of v2
Since u
is being fed the items of v2
to make the table, you would like to subtract the items from each other as vectors (rank 1) and then square them and then add them together. Translating this into J you get +/@:*:@:-"1
as u
+/@:*:@:-"1/~ v2
0 25 50 25 4
25 0 5 20 29
50 5 0 25 50
25 20 25 0 13
4 29 50 13 0
If you time this I hope that you will find it much faster than your solution because it does not require boxing. The key area that you were concerned about with rank was that it be applied after the Table adverb /
Hope this helps even though it is a slightly different approach and let me know what your timings look like.
Upvotes: 1