Reputation:

Searching and matching in data frames

I am very new to R so forgive me if this is extremely basic question. Using the instructions below I edited the question to hopefully make more sense.

I have a data frame d that looks like this

SAMPLE <-c("blueberry", "broccoli")
OPT1 <-c("apple", "beef")
OPT2 <-c("oatmeal", "bacon")
RESPONSE <- c("oatmeal", "beef")
d <- data.frame(SAMPLE,OPT1,OPT2, RESPONSE)

add column of NA for new data

d$OPT1.D <- rep("NA",nrow(d));

and distance matrix dist

X <-c("blueberry", "beef", "oatmeal", "broccoli")
blueberry <-c("0", "0.17", "0.09", "0.21")
beef <-c("0.15", "0", "0.979", "0.75")
oatmeal <- c("0.09", "0.375", "0", "0.71")
broccoli <- c("0.25", "0.671", "0.45", "0")
dist <- data.frame(X,blueberry,beef, oatmeal, broccoli)

So I want to find row/column match in dist for d$RESPONSE and d$SAMPLE. In new column for d$OPT1.D, the first entry should be 0.09, which is distance between 'oatmeal' and "blueberry" in dist. Second entry should be 0.671, distance between "beef" and "broccoli".

Hope this makes more sense? I used the code below, d$OPT1.D <- dist[cbind(d$RESPONSE, d$SAMPLE)] but it returned text, not number. Many thanks.

Overall this seems like should be a fairly straightforward operation but after searching for a bit I can't tell if this is best done by a FOR loop or package like data.table. Advice would be appreciated!

Upvotes: 2

Answers (2)

Arnaud A

Reputation: 377

Your first problem is that the types of d are factors, which are converted to integers (and not characters) when you try to use it as indices in dist[cbind(d$RESPONSE, d$OPT1)]. You need to use stringsAsFactors = FALSE when you call data.frame.

d <- data.frame(SAMPLE,OPT1,OPT2, RESPONSE, stringsAsFactors=FALSE)

The second problem is that dist is a data.frame, and it has no row names. Also, you don't need X to be a column.

dist <- cbind(blueberry,beef, oatmeal, broccoli)
rownames(dist) <- colnames(dist) <- X

Like this it should do what you want.

dist[cbind(d$RESPONSE, d$SAMPLE)]
[1] 0.090 0.671

Upvotes: 2

Aaron - mostly inactive

Reputation: 37754

This is tailor-built for matrix indexing, a little-known but very powerful feature of R. All you need is this command (and then repeat for OPT2).

d$OPT1D <- dist[cbind(d$RESPONSE, d$OPT1)]

By the way, it is helpful to include your data in a way that others can easily read it in. Here's what I did to get it.

d <- read.table(text="SAMPLE        OPT1        OPT2        RESPONSE        OPT1D        OPT2D
banana        blueberry   oatmeal     oatmeal         NA           NA
broccoli      beef        bacon       beef            NA           NA",
                 header=TRUE, stringsAsFactors=FALSE)
dist <- read.table(text="blueberry      beef           oatmeal
0              0.15           0.09
0.17           0              0.0872
0.09           0.0979         0", header=TRUE, stringsAsFactors=FALSE)
dist <- as.matrix(dist)
rownames(dist) <- colnames(dist)



> d
    SAMPLE      OPT1    OPT2 RESPONSE OPT1D OPT2D
1   banana blueberry oatmeal  oatmeal  0.09    NA
2 broccoli      beef   bacon     beef  0.00    NA

Upvotes: 2

Searching and matching in data frames

Answers (2)

Related Questions