Joep_S
Joep_S

Reputation: 537

Statistical testing for multiple columns from a dataframe

For the data frame below I want to perform kolmogorov-smirnov tests for multiple columns. Column ID is the record ID, A-D are factors consisting of 2 levels ('Other' and A,B,C,D respectively. My test variable is in column E.

Now I would like to perform 4 KS tests:

In reality, I have 80 columns, so I'm looking for a way to perform these 80 tests 'Simultaneously'

  ID A B C D  E
1  1 O B C O  1
2  2 O O O O  3
3  3 O O O D  2
4  4 A O C D  7
5  5 A B O O 12
6  6 O O O O  4
7  7 O B O O  8

Upvotes: 1

Views: 1538

Answers (1)

Sergio.pv
Sergio.pv

Reputation: 1400

I hope this solves your problem:

dat <- read.table("path/data.txt") # your data imported into my session. 

cols <- c("A", "B", "C", "D") #these are the your columnss with categories. We leave the others out.
E <- dat$E # but save the E variable
lapply(cols, function(i){ # Evaluate E at each level of each column
  x <- factor(dat[,i])
  a <- E[x == levels(x)[1]]
  b <- E[x == levels(x)[2]]
  ks.test(a, b)
  }) #you get a list with the results for each column

Upvotes: 3

Related Questions