Remi.b
Remi.b

Reputation: 18239

R: function which check for complete design

I want to create a function who takes from 1 to 10 vectors and returns Yes or No meaning that all my vectors represent a "complete design".

Here is an example of what I would call a "complete design":

a <- c(1,1,1,1,2,2,2,2,3,3,3,3)

b <- c(1,2,1,2,1,2,1,2,1,2,1,2)

c <- c(1,1,2,2,1,1,2,2,1,1,2,2)

It is a complete design because: For all levels of all vectors there are (at the same position) all levels of all other vectors the same number of times.

Here is 2 examples of "incomplete design": (In the two following examples a and b match but c does not match a neither with b)

example 1:

a <- c(1,1,1,1,2,2,2,2,3,3,3,3)

b <- c(1,2,1,2,1,2,1,2,1,2,1,2)

c <- c(1,2,3,1,2,3,1,2,3,1,2,3)

example 2:

a <- c(1,1,1,1,2,2,2,2,3,3,3,3)

b <- c(1,2,1,2,1,2,1,2,1,2,1,2)

c <- c(1,2,3,4,5,1,2,3,4,5,1,2)

Hope I am clear. The whole idea is that I have a data set and this data set is explained by the factors a,b,c,d,e,etc... And I want a function that tell me wether I'm testing a complete or an incomplete design before running an aov() on it.

Thanks a lot !

The question to ask in the function I want to create is something like: when a equal a given level (let's say 2 for example). We look at b[which(a==2)] and check whether all levels of b are contained in b[which(a==2)] and also check if all levels of b[which(a==2)] are repeated the same number of times.

Upvotes: 1

Views: 130

Answers (2)

hadley
hadley

Reputation: 103938

There's a very simple way to do this using plyr's id() function:

library(plyr)
a <- c(1,1,1,1,2,2,2,2,3,3,3,3)
b <- c(1,2,1,2,1,2,1,2,1,2,1,2)
c <- c(1,1,2,2,1,1,2,2,1,1,2,2)

ids <- id(data.frame(a, b, c))
attr(ids, "n") == length(unique(ids))
# [1] TRUE

d <- c(1,1,2,2,1,1,2,2,1,1,2,3)
ids <- id(data.frame(a, b, d))
attr(ids, "n") == length(unique(ids))
# [1] FALSE

id() works by assigning a unique id to each row in the input, in such a way that there's room for all possible combinations. The output contains an attribute, n, which gives the total possible number of combinations.

Upvotes: 4

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193667

If I understand correctly, you might be able to use interaction in some way as a first-cut in deciding whether your data are complete.

Looking at your examples, it seems like there should always be one unique combination (not 2, not 0) of all the unique levels within each vector. So, for the first set you show:

> all(table(interaction(a, b, c)) == 1)
[1] TRUE

And, for the other two examples, if you did the same, you would get FALSE as the result.


Another option would be to assume that we can treat variable a as a grouping variable, and put all the vectors into a data.frame like this:

df1 <- data.frame(a = c(1,1,1,1,2,2,2,2,3,3,3,3),
                  b = c(1,2,1,2,1,2,1,2,1,2,1,2),
                  c = c(1,1,2,2,1,1,2,2,1,1,2,2))

Once we have done that, we can split that data.frame as follows:

DF1 <- split(df1[-1], df1[1])

Then, we can write a little function to check to see that each part of the split is equal. We'll cheat a little bit and use merge, but there must be more robust ways to do this. The idea is that if we use merge on inputs that are identical, it should just end up a single data.frame that is the same as all the input data.frames.

Here's a (not so robust--not extensively tested) function that can be used as a starting point.

myFun <- function(myList) {
  all.equal(Reduce(function(x, y) 
    merge(x, y, all = TRUE, sort = FALSE), myList),
            myList[[1]], check.attributes = FALSE)
}

Applied to DF1, it gives us TRUE, but try the following:

df2 <- data.frame(a = c(1,1,1,1,2,2,2,2,3,3,3,3),
                  b = c(1,2,1,2,1,2,1,2,1,2,1,2),
                  c = c(1,2,3,1,2,3,1,2,3,1,2,3))
df3 <- data.frame(a = c(1,1,1,1,2,2,2,2,3,3,3,3),
                  b = c(1,2,1,2,1,2,1,2,1,2,1,2),
                  c = c(1,2,3,4,5,1,2,3,4,5,1,2))

DF1 <- split(df1[-1], df1[1])
DF2 <- split(df2[-1], df2[1])
DF3 <- split(df3[-1], df3[1])

myFun(DF1)
# [1] TRUE
myFun(DF2)
# [1] "Component 1: Numeric: lengths (6, 4) differ" "Component 2: Numeric: lengths (6, 4) differ"
myFun(DF3)
# [1] "Component 1: Numeric: lengths (10, 4) differ" "Component 2: Numeric: lengths (10, 4) differ"

Upvotes: 2

Related Questions