user1357079
user1357079

Reputation: 89

Running a Fisher test on each row of a data frame in R

I have a data frame of ~50k measurements taken by ~3k investigators.

INVESTIGATOR_ID \\\ SAMPLE_ID \\\ MEASUREMENT
1000            \\\ 38942     \\\ 20.1
1000            \\\ 38942     \\\ 10.2
1001            \\\ 38432     \\\ 5.6
1002            \\\ 553       \\\ 10.6
...

My goal is to compare sample measurements per investigator to measurements from the entire data set:

  1. For each investigator, count those measurements that are +/- one standard deviation from the measurement mean collected by that investigator.
  2. For the entire data frame, count those measurements that are +/- one standard deviation from the mean.
  3. For each investigator that has sample measurements +/- one standard deviation from the mean, run a Fisher's exact test to determine if the number of samples is significant (compared to the entire data frame).

I've used the Plyr library (ddply) to summarise the data by INVESTIGATOR_ID. Merging data together, the end result is a data frame, where each row consists of an investigator ID, the number of samples measured by that investigator, number of samples measured by that investigator +/- 1 SD, 15000, and 50000 (where 15000 and 50000 are the corresponding sample numbers +/- 1 SD and the total number of samples for the entire data frame).

INVESTIGATOR_ID \\\ NUMBER_OF_SAMPLES \\\ NUMBER_OF_SAMPLES_SD \\\ 15000 \\\ 50000

How do I take each row of the data frame, convert fields c(2:5) to a matrix, run a Fisher's test, and create a new data frame of the results?

Thanks for any suggestions.

Upvotes: 2

Views: 3892

Answers (1)

vodka
vodka

Reputation: 508

Something like that (adapted from a script of mine, could need more modifications to fit you needs):

get_fisher <- function(df){
  mat <- matrix(as.numeric(df[c(2:5)]), ncol=2)
  f <- fisher.test(as.table(mat), alt="two.sided")
  return(c(df[1], f$p.value))
}

fishers <- apply(df, 1,  get_fisher)

Upvotes: 4

Related Questions