Steve
Steve

Reputation: 5977

Data reshaping and logical indexing in R

I have the following (dummy) data:

d <- structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 5L, 
5L, 5L, 5L, 5L, 5L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 
2L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("apple", "grapefruit", 
"orange", "peach", "pear"), class = "factor"), type = structure(c(2L, 
2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 
1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L), .Label = c("large", 
"small"), class = "factor"), location = structure(c(1L, 2L, 3L, 
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("P1", 
"P2", "P3"), class = "factor"), diameter = c(17.2, 19.1, 18.5, 
23.3, 22.9, 19.4, 11.1, 11.8, 6.8, 3.2, 7.9, 5.6, 8.4, 9.2, 9.7, 
17.1, 19.4, 18.9, 11.8, 10.6, 10.1, 18.8, 17.9, 13.2, 8.5, 8.9, 
7.2, 10.1, 8.7, 6.6)), .Names = c("group", "type", "location", 
"diameter"), class = "data.frame", row.names = c(NA, -30L))

I'd like to create a new data frame from this, deriving ratios from the "diameter" variable for each level of 3 factors: "location", "type", and "group".

P3.P1.L <- with(d, diameter[group=="pear" & type=="large" & location=="P3"] / diameter[group=="pear" & type=="large" & location=="P1"] )
P2.P1.L <- with(d, diameter[group=="pear" & type=="large" & location=="P2"] / diameter[group=="pear" & type=="large" & location=="P1"] )
P3.P1.S <- with(d, diameter[group=="pear" & type=="small" & location=="P3"] / diameter[group=="pear" & type=="small" & location=="P1"] )
P2.P1.S <- with(d, diameter[group=="pear" & type=="small" & location=="P2"] / diameter[group=="pear" & type=="small" & location=="P1"] )

The final data.frame would look something like this:

group, type, P2.P1, P3.P1
pear, large, 1.75, 2.469
pear, small, 0.613, 1.063
apple, large, ..., ...
apple, small, ..., ...

Obviously, I could do this like i've illustrated above - logically indexing the correct levels of the 3 factors in each instance. The problem is, in my real data I have about 40 levels in the "group" factor (though still only 2 in "type"). I'd like a solution that will allow me to use logical indexing with "location" and perhaps "type", and then iterate through all the levels of "group". For example, something like:

with(d, by(d, group, function(x) diameter[type=="large" & location=="P3"] / diameter[type=="large" & location=="P1"]) )

But this doesn't quite do what i'd like (and indexing with "group==x" doesn't work either).

A solution that keeps track of the association of each ratio with its "group" and "type" factor levels, and then places these into the new data frame, as illustrated in the desired output above, would be spectacular. Any suggestions on how to approach this would be much appreciated.

Upvotes: 0

Views: 242

Answers (1)

Vincent Zoonekynd
Vincent Zoonekynd

Reputation: 32351

You can use dcast to convert the data to a wider format.

library(reshape2)
d <- dcast( d, group + type ~ location )

It is then straightforward to compute the ratios you want, for instance:

transform( d, P2.P1=P2/P1, P3.P1=P3/P1 )

Upvotes: 2

Related Questions