Reputation: 49
I am working on implementing k-Means in R.
I computy my feature vectors from individual files and put them all into a bag which I called "holder" in this fashion:
holder[[filename]] <- c(featureVector)
I can then recover a given feature vector if i write:
holder[["file3453.txt"]]
or holder[[number]]
.
I will be using the feature vectors for centroids and some other computation, so assuming that I have a feature vector V, how do I get the name of the file from holder?
This question could also be interpreted as:
Given the value(feature vector) how can I determine the key(filename) ?
Upvotes: 1
Views: 705
Reputation: 49448
But why lose that connection between label and vector in the first place to need a reverse lookup? Just keep them together, and you won't have this problem:
library(data.table)
dt = data.table(filename = c('a', 'b'), feature = list(c(1,2,3), c(2,3,4)))
dt
# filename feature
#1: a 1,2,3
#2: b 2,3,4
# possibly set the key to filename for fast access
setkey(dt, filename)
dt['a'] # this does a fast binary search lookup in data.tables
# modify the feature in some way for each filename:
dt[, feature := awesome_func(feature), by = filename]
# etc
Upvotes: 2
Reputation: 3462
To extend nograpes' solution. if you want to build a reverse map you can do the following:
#this function converts a feature vector
#there may be other, better ways to do that, but this one is simple.
#it works best when your feature vectors are short and contain integers
#it may not work at all due to precision issues for real numbers
my.to.string = function(x) paste(x,collapse="_")
when building your holder vector do this:
holder[[filename]] = featureVector #updates the holder
reverseMap[[my.to.string(featureVector)]] = filename # updates the reverse map
now - to get your task done just do
my.file = reverseMap[[my.to.string(my.feature)]]
This is straightforward, and will work for simple cases. It cannot replace a real hashcode based data structures which I haven't yet seen or needed R.
Upvotes: 2
Reputation: 18323
You should know that lists are not implemented with a hashtable in R. Also, there is no efficient way of doing what you want, you would either have to maintain a reverse lookup list, or just scan for matching indices. For example,
# Test data.
holder<-list(`file1`=c(1,0,1,0),`file2`=c(1,1,1,1),`file3`=c(1,0,1,0))
# Find this feature.
feature<-c(1,0,1,0)
# Find all indices that have this feature vector.
names(holder)[sapply(holder,function(x)all(x==feature))]
Upvotes: 1