Reputation: 89
I have a data frame of ~50k measurements taken by ~3k investigators.
INVESTIGATOR_ID \\\ SAMPLE_ID \\\ MEASUREMENT 1000 \\\ 38942 \\\ 20.1 1000 \\\ 38942 \\\ 10.2 1001 \\\ 38432 \\\ 5.6 1002 \\\ 553 \\\ 10.6 ...
My goal is to compare sample measurements per investigator to measurements from the entire data set:
I've used the Plyr library (ddply
) to summarise the data by INVESTIGATOR_ID
. Merging data together, the end result is a data frame, where each row consists of an investigator ID, the number of samples measured by that investigator, number of samples measured by that investigator +/- 1 SD, 15000, and 50000 (where 15000 and 50000 are the corresponding sample numbers +/- 1 SD and the total number of samples for the entire data frame).
INVESTIGATOR_ID \\\ NUMBER_OF_SAMPLES \\\ NUMBER_OF_SAMPLES_SD \\\ 15000 \\\ 50000
How do I take each row of the data frame, convert fields c(2:5)
to a matrix, run a Fisher's test, and create a new data frame of the results?
Thanks for any suggestions.
Upvotes: 2
Views: 3892
Reputation: 508
Something like that (adapted from a script of mine, could need more modifications to fit you needs):
get_fisher <- function(df){
mat <- matrix(as.numeric(df[c(2:5)]), ncol=2)
f <- fisher.test(as.table(mat), alt="two.sided")
return(c(df[1], f$p.value))
}
fishers <- apply(df, 1, get_fisher)
Upvotes: 4