user2827159
user2827159

Reputation: 49

Hashed lists in R, recovering a key given a value(vector)

I am working on implementing k-Means in R.

I computy my feature vectors from individual files and put them all into a bag which I called "holder" in this fashion:

holder[[filename]] <- c(featureVector)

I can then recover a given feature vector if i write:

holder[["file3453.txt"]] or holder[[number]].

I will be using the feature vectors for centroids and some other computation, so assuming that I have a feature vector V, how do I get the name of the file from holder?

This question could also be interpreted as:

Given the value(feature vector) how can I determine the key(filename) ?

Upvotes: 1

Views: 705

Answers (3)

eddi
eddi

Reputation: 49448

But why lose that connection between label and vector in the first place to need a reverse lookup? Just keep them together, and you won't have this problem:

library(data.table)

dt = data.table(filename = c('a', 'b'), feature = list(c(1,2,3), c(2,3,4)))
dt
#   filename feature
#1:        a   1,2,3
#2:        b   2,3,4

# possibly set the key to filename for fast access
setkey(dt, filename)
dt['a']    # this does a fast binary search lookup in data.tables

# modify the feature in some way for each filename:
dt[, feature := awesome_func(feature), by = filename]

# etc

Upvotes: 2

amit
amit

Reputation: 3462

To extend nograpes' solution. if you want to build a reverse map you can do the following:

#this function converts a feature vector
#there may be other, better ways to do that, but this one is simple.
#it works best when your feature vectors are short and contain integers
#it may not work at all due to precision issues for real numbers
my.to.string = function(x) paste(x,collapse="_")  

when building your holder vector do this:

holder[[filename]] = featureVector   #updates the holder
reverseMap[[my.to.string(featureVector)]] = filename   # updates the reverse map

now - to get your task done just do

my.file = reverseMap[[my.to.string(my.feature)]]

This is straightforward, and will work for simple cases. It cannot replace a real hashcode based data structures which I haven't yet seen or needed R.

Upvotes: 2

nograpes
nograpes

Reputation: 18323

You should know that lists are not implemented with a hashtable in R. Also, there is no efficient way of doing what you want, you would either have to maintain a reverse lookup list, or just scan for matching indices. For example,

# Test data.
holder<-list(`file1`=c(1,0,1,0),`file2`=c(1,1,1,1),`file3`=c(1,0,1,0))
# Find this feature.
feature<-c(1,0,1,0)
# Find all indices that have this feature vector.
names(holder)[sapply(holder,function(x)all(x==feature))]

Upvotes: 1

Related Questions