Evan Zamir
Evan Zamir

Reputation: 8481

Is this a good way to make a multi-dimensional "dictionary" in R?

Say I have a pair of factors, X & Y. Furthermore X has three levels, and Y has 4 levels. An example might be:

X = c("red","blue","yellow")
Y = c(1,2,3,4)

That's obviously 12 combinations of factors and let's say that for each combination I want to create and store some data, perhaps, as a data frame, or maybe as an interpolating function like a spline. The point is the data could be arbitrary.

Now, I want to be able to look up the data by using the combinations of factors. I don't know if this is the right way to do it (hence my question), but here's how I thought I could solve this:

dict <- list()
combinations <- expand.grid(X = c("red","blue","yellow"),Y = c(1,2,3,4))
for (i in 1:dim(combinations)[1]) {
  dict[paste(combinations$X[i],combinations$Y[i],sep=":")] <- paste(combinations$X[i],combinations$Y[i],sep=":")
}

The result:

> dict
$`red:1`
[1] "red:1"

$`blue:1`
[1] "blue:1"

$`yellow:1`
[1] "yellow:1"

$`red:2`
[1] "red:2"

$`blue:2`
[1] "blue:2"

$`yellow:2`
[1] "yellow:2"

$`red:3`
[1] "red:3"

$`blue:3`
[1] "blue:3"

$`yellow:3`
[1] "yellow:3"

$`red:4`
[1] "red:4"

$`blue:4`
[1] "blue:4"

$`yellow:4`
[1] "yellow:4"

Now if I want to change a specific key, value combination, I can do so relatively easily:

dict["red:4"] <- "insert some cool function here"

> dict["red:4"]
$`red:4`
[1] "insert some cool function here"

So, obviously, this is pretty silly if you're just going to have text as the values. But I think it becomes useful if the "values" are actually objects or data frames. What do you all think about this? Is there another easier way to implement this same type of functionality already existing in R that I don't know about?

Upvotes: 1

Views: 318

Answers (2)

Tommy O&#39;Dell
Tommy O&#39;Dell

Reputation: 7109

Just thought I'd add a vectorised version qwwqwwq's answer.

hash <- function( ) {
  new.env( hash = TRUE, parent = emptyenv() ) 
}

set <- function(key, val, hash) {
  invisible(mapply(assign, key, val, MoreArgs = list(envir = hash)))
}

lookup <- function(key, hash, use_names = TRUE) {
  sapply(key, get, envir = hash, USE.NAMES = use_names)
}

Which you can then use as follows...

> d = hash()
> set(letters, 1:26, d)
> lookup('z', d)
 z 
26 
> lookup('y', d)
 y 
25 
> lookup(c('x','y','z'), d)
 x  y  z 
24 25 26 
> lookup(c('x','y','z'), d, FALSE)
[1] 24 25 26

Upvotes: 1

qwwqwwq
qwwqwwq

Reputation: 7329

The problem with your dict is that it's actually a list, look-up will be linear and will not come close to the performance of an actual hash table. The R environment itself stores objects using a hash, so you can just create a new environment with the hash property set to TRUE as use it as you would a hash/dictionary:

hash <- function( ) {
    return( new.env( hash = TRUE, parent = emptyenv() ) )
}

set <- function( hash, key, val ) {
    assign(key, val, envir = hash)
}

lookup <- function( hash, key) {
    return( get(key, envir = hash) )
}

d = hash()

set(d, 'a', 3)

print(lookup(d, 'a'))
## [1] 3

Here's a great resource explaining this in more detail: http://broadcast.oreilly.com/2010/03/lookup-performance-in-r.html

Upvotes: 3

Related Questions