Reputation: 8481
Say I have a pair of factors, X & Y. Furthermore X has three levels, and Y has 4 levels. An example might be:
X = c("red","blue","yellow")
Y = c(1,2,3,4)
That's obviously 12 combinations of factors and let's say that for each combination I want to create and store some data, perhaps, as a data frame, or maybe as an interpolating function like a spline. The point is the data could be arbitrary.
Now, I want to be able to look up the data by using the combinations of factors. I don't know if this is the right way to do it (hence my question), but here's how I thought I could solve this:
dict <- list()
combinations <- expand.grid(X = c("red","blue","yellow"),Y = c(1,2,3,4))
for (i in 1:dim(combinations)[1]) {
dict[paste(combinations$X[i],combinations$Y[i],sep=":")] <- paste(combinations$X[i],combinations$Y[i],sep=":")
}
The result:
> dict
$`red:1`
[1] "red:1"
$`blue:1`
[1] "blue:1"
$`yellow:1`
[1] "yellow:1"
$`red:2`
[1] "red:2"
$`blue:2`
[1] "blue:2"
$`yellow:2`
[1] "yellow:2"
$`red:3`
[1] "red:3"
$`blue:3`
[1] "blue:3"
$`yellow:3`
[1] "yellow:3"
$`red:4`
[1] "red:4"
$`blue:4`
[1] "blue:4"
$`yellow:4`
[1] "yellow:4"
Now if I want to change a specific key, value combination, I can do so relatively easily:
dict["red:4"] <- "insert some cool function here"
> dict["red:4"]
$`red:4`
[1] "insert some cool function here"
So, obviously, this is pretty silly if you're just going to have text as the values. But I think it becomes useful if the "values" are actually objects or data frames. What do you all think about this? Is there another easier way to implement this same type of functionality already existing in R that I don't know about?
Upvotes: 1
Views: 318
Reputation: 7109
Just thought I'd add a vectorised version qwwqwwq's answer.
hash <- function( ) {
new.env( hash = TRUE, parent = emptyenv() )
}
set <- function(key, val, hash) {
invisible(mapply(assign, key, val, MoreArgs = list(envir = hash)))
}
lookup <- function(key, hash, use_names = TRUE) {
sapply(key, get, envir = hash, USE.NAMES = use_names)
}
Which you can then use as follows...
> d = hash()
> set(letters, 1:26, d)
> lookup('z', d)
z
26
> lookup('y', d)
y
25
> lookup(c('x','y','z'), d)
x y z
24 25 26
> lookup(c('x','y','z'), d, FALSE)
[1] 24 25 26
Upvotes: 1
Reputation: 7329
The problem with your dict is that it's actually a list, look-up will be linear and will not come close to the performance of an actual hash table. The R environment itself stores objects using a hash, so you can just create a new environment with the hash property set to TRUE
as use it as you would a hash/dictionary:
hash <- function( ) {
return( new.env( hash = TRUE, parent = emptyenv() ) )
}
set <- function( hash, key, val ) {
assign(key, val, envir = hash)
}
lookup <- function( hash, key) {
return( get(key, envir = hash) )
}
d = hash()
set(d, 'a', 3)
print(lookup(d, 'a'))
## [1] 3
Here's a great resource explaining this in more detail: http://broadcast.oreilly.com/2010/03/lookup-performance-in-r.html
Upvotes: 3