Reputation: 33
I have two tables, m and epi. The epi table contains names of m columns .
head(m[,1:6])
Geno 11DPW 8266 80647 146207 146227
1 SB002XSB012 0.87181895 G/G C/C G/G A/A
2 SB002XSB018 Na G/G C/T G/G A/A
3 SB002XSB044 1.057744 G/G C/C G/G A/A
4 SB002XSB051 1.64736814 G/G C/C G/G A/A
5 SB002XSB067 0.69987475 A/G C/C G/G A/G
6 SB002XSB073 0.60552177 A/G C/C G/G A/G
> dim(m)
[1] 167 28234
and
head(epi)
SNP1 SNP2
1 7789543 12846898
2 12846898 7789543
3 24862913 4603896
4 4603896 24862913
5 50592569 7789543
6 27293494 57162585
dim(epi)
[1] 561 2
I want to take each row of epi, and to do a tow-way anova of these 2 columns in m on the 11DPW in m. I tried
f<-function (x) {
anova(lm (as.numeric(m$"11DPW")~ m[,epi[x,1]]*m[,epi[x,2]]))
}
apply(epi,1,f)
and got error : Error in [.data.frame
(m, , epi[x, 1]) : undefined columns selected
Any suggestions ?
Thanks,
Imri
Upvotes: 1
Views: 605
Reputation: 55420
Putting aside for a moment the complications from using integers as column names (that is, assuming that this issue is handled correctly)
"undefined columns selected"
error if the column indicated in epi
does not exist in m
offendingElements <- !sapply(epi, "%in%", colnames(m))
# since an offending element likely disqualifies the row from the anova test, identify the whole row
offendingRows <- which(offendingElements) %% nrow(epi)
# perform your apply statement over:
epi[-offendingRows, ]
when you use apply(epi, 1, f)
what you are passing to each call of f
is an entire row of epi
. Therefore, epi[x, 1]
is not giving you the results you want. For example, on the 7th iteration of the apply statement x
is the equivalent of epi[7, ]
. Therefore to get the first column, you just need to index x
directly. Therefore, in your function:
Instead of epi[x, 1] and epi[x, 2]
You want to use x[[1]] and x[[2]]
That is the first part. Second, we need to deal with integers as column names. VERY IMPORTANT: If you use m[, 7823] this will get you the 7823rd column of m. You have to be sure to convert the integers to strings, indicating that you want the column NAMED "7823", NOT (neceessarilly) the 7823rd column.
Use as.character
for this:
m[, as.character(x[[1]])]
offendingElements <- !sapply(epi, "%in%", colnames(m))
offendingRows <- which(offendingElements) %% nrow(epi)
apply(epi[-offendingRows, ], 1, function (x)
anova( lm ( as.numeric(m$"11DPW") ~ m[, as.character(x[[1]]) ] * m[, as.character(x[[2]]) ] ))
)
There is an alternative way to dealing with the names, the simplest would be to make them appropriate strings
# clean up the elements in epi
epi.clean <- sapply(epi, make.names)
# clean up m's column names
colnames(m) <- make.names(colnames(m))
# use epi.clean in your apply statement. Dont forget offendingRows
apply(epi.clean[-offendingRows, ], 1, function (x)
anova( lm ( as.numeric(m$"11DPW") ~ m[, x[[1]] ] * m[, x[[2]] ] ))
)
Upvotes: 1
Reputation: 21532
I suspect your values in epi
are numbers, but what you want to use are their character equivalents, since the column names in m
are character strings (even though these strings are made up of numerals). Try this instead:
m[[as.character(epi[x,])]]
(etc)
The [[
operator is quirky but very cool.
Upvotes: 0