fourdegreeswester
fourdegreeswester

Reputation: 13

One-Sample Wilcoxon Signed Rank Test over multiple columns in R

I would like to use a one-sample Wilcoxon Signed Rank Test to test whether each column in R is significantly greater than 0. I can go through each column individually, but I would ideally like to use lapply to cycle through each column and record the p-values in a separate dataframe. Each row of the dataframe lists monthly values for a given year:

df = data.frame("year"=c(1:20), "jan"=runif(20), "feb"=runif(20))

... with 13 total columns for year and each month.

The code I am using now compares each column to zero, but I would like to incorporate the lapply function to streamline things a bit:

wilcox.test(df[,1], mu=0, alternative="greater")

I have tried:

res = lapply(df, function(x){
      wilcox.test(df[,x[1]], mu=0, alternative="greater")
      })

But I am getting an error that my input to the wilcox.test function is not numeric which makes me think it is not reading in individual columns.I have tried using some suggestions in this post but am having trouble modifying the code to work for a one-sample test. I am new to lapply and writing functions, so any help is greatly appreciated!

Upvotes: 1

Views: 975

Answers (1)

Ian Campbell
Ian Campbell

Reputation: 24770

You can directly apply over columns in a data.frame with lapply. Make sure that you only pass columns that contain numeric values by subsetting to only those columns.

lapply(df[,2:13],function(x){wilcox.test(x, mu=0, alternative="greater")})

Your version doesn't work because you are trying to subset df by an entire column of df (ie df[,df[,1]], instead of df[,1]).

To streamline things even further, you can use sapply, and $p.value to access just the p-value results.

sapply(df[,2:13],function(x){wilcox.test(x, mu=0, alternative="greater")$p.value})
#         jan          feb          mar          apr          may          jun          #jul          aug          sep          oct          nov 
#9.536743e-07 9.536743e-07 9.536743e-07 9.536743e-07 9.536743e-07 9.536743e-07 9.536743e-07 9.536743e-07 9.536743e-07 9.536743e-07 9.536743e-07 
#         dec 
#9.536743e-07 

Data

df <- data.frame(year = 1:20, lapply(rep(20,12),runif))
names(df)[2:13] <- tolower(month.abb)

Upvotes: 1

Related Questions