Reputation:
I am very new to R
so forgive me if this is extremely basic question. Using the instructions below I edited the question to hopefully make more sense.
I have a data frame d
that looks like this
SAMPLE <-c("blueberry", "broccoli")
OPT1 <-c("apple", "beef")
OPT2 <-c("oatmeal", "bacon")
RESPONSE <- c("oatmeal", "beef")
d <- data.frame(SAMPLE,OPT1,OPT2, RESPONSE)
add column of NA for new data
d$OPT1.D <- rep("NA",nrow(d));
and distance matrix dist
X <-c("blueberry", "beef", "oatmeal", "broccoli")
blueberry <-c("0", "0.17", "0.09", "0.21")
beef <-c("0.15", "0", "0.979", "0.75")
oatmeal <- c("0.09", "0.375", "0", "0.71")
broccoli <- c("0.25", "0.671", "0.45", "0")
dist <- data.frame(X,blueberry,beef, oatmeal, broccoli)
So I want to find row/column match in dist
for d$RESPONSE
and d$SAMPLE
. In new column for d$OPT1.D
, the first entry should be 0.09
, which is distance between 'oatmeal' and "blueberry" in dist
. Second entry should be 0.671
, distance between "beef" and "broccoli".
Hope this makes more sense? I used the code below, d$OPT1.D <- dist[cbind(d$RESPONSE, d$SAMPLE)]
but it returned text, not number. Many thanks.
Overall this seems like should be a fairly straightforward operation but after searching for a bit I can't tell if this is best done by a FOR loop or package like data.table. Advice would be appreciated!
Upvotes: 2
Views: 107
Reputation: 377
Your first problem is that the types of d are factors, which are converted to integers (and not characters) when you try to use it as indices in dist[cbind(d$RESPONSE, d$OPT1)]. You need to use stringsAsFactors = FALSE when you call data.frame.
d <- data.frame(SAMPLE,OPT1,OPT2, RESPONSE, stringsAsFactors=FALSE)
The second problem is that dist is a data.frame, and it has no row names. Also, you don't need X to be a column.
dist <- cbind(blueberry,beef, oatmeal, broccoli)
rownames(dist) <- colnames(dist) <- X
Like this it should do what you want.
dist[cbind(d$RESPONSE, d$SAMPLE)]
[1] 0.090 0.671
Upvotes: 2
Reputation: 37754
This is tailor-built for matrix indexing, a little-known but very powerful feature of R. All you need is this command (and then repeat for OPT2).
d$OPT1D <- dist[cbind(d$RESPONSE, d$OPT1)]
By the way, it is helpful to include your data in a way that others can easily read it in. Here's what I did to get it.
d <- read.table(text="SAMPLE OPT1 OPT2 RESPONSE OPT1D OPT2D
banana blueberry oatmeal oatmeal NA NA
broccoli beef bacon beef NA NA",
header=TRUE, stringsAsFactors=FALSE)
dist <- read.table(text="blueberry beef oatmeal
0 0.15 0.09
0.17 0 0.0872
0.09 0.0979 0", header=TRUE, stringsAsFactors=FALSE)
dist <- as.matrix(dist)
rownames(dist) <- colnames(dist)
> d
SAMPLE OPT1 OPT2 RESPONSE OPT1D OPT2D
1 banana blueberry oatmeal oatmeal 0.09 NA
2 broccoli beef bacon beef 0.00 NA
Upvotes: 2