Reputation: 33

in R, How to use one table, to define columns to be used for two-way ANOVA in another table?

I have two tables, m and epi. The epi table contains names of m columns .

  head(m[,1:6])
         Geno    11DPW      8266         80647        146207    146227
1 SB002XSB012 0.87181895    G/G           C/C          G/G        A/A
2 SB002XSB018         Na    G/G           C/T          G/G        A/A
3 SB002XSB044   1.057744    G/G           C/C          G/G        A/A
4 SB002XSB051 1.64736814    G/G           C/C          G/G        A/A
5 SB002XSB067 0.69987475    A/G           C/C          G/G        A/G
6 SB002XSB073 0.60552177    A/G           C/C          G/G        A/G

    > dim(m)

[1]   167 28234
and 
head(epi)
       SNP1      SNP2
1  7789543   12846898
2 12846898  7789543
3 24862913  4603896
4  4603896   24862913
5 50592569  7789543
6 27293494   57162585

    dim(epi)

[1] 561   2

I want to take each row of epi, and to do a tow-way anova of these 2 columns in m on the 11DPW in m. I tried

f<-function (x) {
 anova(lm (as.numeric(m$"11DPW")~ m[,epi[x,1]]*m[,epi[x,2]]))
 }
apply(epi,1,f)

and got error : Error in [.data.frame(m, , epi[x, 1]) : undefined columns selected Any suggestions ? Thanks, Imri

Upvotes: 1

Answers (2)

Ricardo Saporta

Reputation: 55420

Putting aside for a moment the complications from using integers as column names (that is, assuming that this issue is handled correctly)

You will still get the `"undefined columns selected"` error if the column indicated in `epi` does not exist in `m`

offendingElements <- !sapply(epi, "%in%", colnames(m))

# since an offending element likely disqualifies the row from the anova test, identify the whole row
offendingRows <- which(offendingElements) %% nrow(epi)   

# perform your apply statement over:
epi[-offendingRows, ]

CLEANING UP THE FUNCTION USED IN APPLY

when you use apply(epi, 1, f) what you are passing to each call of f is an entire row of epi. Therefore, epi[x, 1] is not giving you the results you want. For example, on the 7th iteration of the apply statement x is the equivalent of epi[7, ]. Therefore to get the first column, you just need to index x directly. Therefore, in your function:

Instead of       epi[x, 1]   and    epi[x, 2]
You want to use  x[[1]]      and    x[[2]]

That is the first part. Second, we need to deal with integers as column names. VERY IMPORTANT: If you use m[, 7823] this will get you the 7823rd column of m. You have to be sure to convert the integers to strings, indicating that you want the column NAMED "7823", NOT (neceessarilly) the 7823rd column.

Use as.character for this:

   m[, as.character(x[[1]])]

PUTTING IT ALL TOGETHER

offendingElements <- !sapply(epi, "%in%", colnames(m))
offendingRows <- which(offendingElements) %% nrow(epi)   

apply(epi[-offendingRows, ], 1, function (x) 
   anova( lm ( as.numeric(m$"11DPW") ~ m[, as.character(x[[1]]) ] * m[, as.character(x[[2]]) ] ))
)

There is an alternative way to dealing with the names, the simplest would be to make them appropriate strings

# clean up the elements in epi
epi.clean <- sapply(epi, make.names)

# clean up m's column names
colnames(m) <- make.names(colnames(m))

# use epi.clean  in your apply statement.  Dont forget offendingRows
apply(epi.clean[-offendingRows, ], 1, function (x) 
   anova( lm ( as.numeric(m$"11DPW") ~ m[, x[[1]] ] * m[, x[[2]] ] ))
)

Upvotes: 1

Carl Witthoft

Reputation: 21532

I suspect your values in epi are numbers, but what you want to use are their character equivalents, since the column names in m are character strings (even though these strings are made up of numerals). Try this instead:

m[[as.character(epi[x,])]] (etc)

The [[ operator is quirky but very cool.

Upvotes: 0

in R, How to use one table, to define columns to be used for two-way ANOVA in another table?

Answers (2)

You will still get the "undefined columns selected" error if the column indicated in epi does not exist in m

CLEANING UP THE FUNCTION USED IN APPLY

PUTTING IT ALL TOGETHER

Related Questions

You will still get the `"undefined columns selected"` error if the column indicated in `epi` does not exist in `m`