Reputation: 157
I have a vector containing times in milliseconds looking like this;
vector <- c(667753, 671396, 675356, 679286, 683413, 687890, 691742,
695651, 700100, 704552, 708832, 713117, 717082, 720872, 725002, 729490,
733824, 738233, 742239, 746092, 750003, 754236, 867342, 870889, 873704,
876617, 879626, 882595, 885690, 888602, 891789, 894717, 897547, 900797,
903615, 906646, 909624, 912613, 915645, 918566, 921792, 924625, 927538,
930721, 933542)
Now i want to look into a large data frame with a lot of time columns and search for a single column that contains time values being closest (row-wise) to my vector time values.
The data.frame containing all the columns is of the same number of rows. So lets say my vector has 240 elements, then every column in the larger data.frame consists of 240 rows.
Any idia how to do this ?
Upvotes: 1
Views: 127
Reputation: 171
You can calculate the euclidean distance from your vector and each column of the dataframe and then check which column has the smallest distance:
which.min(sapply(1:ncol(dataFrame), function(i) sqrt(sum((t(v)-dataFrame[,i])^2))))
The above returns the index of the column with the lowest distance.
Where dataFrame is the data frame containing columns of different times(so we compare each column to the vector v) and v is the vector.
The following is just the square root of the sum of squared distances (euclidean distance):
sqrt(sum((t(v)-dataFrame[,i])^2)))
You can also use the following as a distance measure:
abs(t(v)-dataFrame[,i])
EDIT
As Evan Friedland pointed out you can actually just use:
which.min(colSums(abs(v-dataFrame)))
or
which.min(sqrt(colSums((t(v)-dataFrame)^2)))
Upvotes: 3