wesleysc352
wesleysc352

Reputation: 617

How to make correlation matrix with especific columns in R

I have a multi-column dataframe that can be divided into two categories: land use and water quality.

I would like to analyze the correlation only between water quality variables and land use variables, without there being a correlation between land use variables and a correlation between quality variables.

I am using package corrplot() spearmen method but i dont know how i can ignore correlation between categories.

Landuse = veg, wet, dry, water

Quality water = OD, DBO, DQO

library(ggplot2)
library(dplyr)
library(corrplot)


veg<-c(1,2,3,2.3,4.1)
wet<-c(2,2.3,1.9,2.5,2.2)
dry<-c(5,5.1,6.9,4.3,5.3)
water<-c(0.69,0.75,0.81,0.82,0.82)
coli<-c(10,11,12,13,9.7)
OD<-c(1,3,2.5,2.7,1.8)
DBO<-c(7,8,9,6.5,8)
DQO<-c(3.5,4,4.1,3,2)

#landuse=veg, wet, dry, water
#quality water = OD, DBO, DQO

data_land<-data.frame(veg, wet, dry, water, OD, DBO, DQO)

correl<-corrplot(cor(as.matrix(data_land),method = "spearman"),
                 method = "color",
                  tl.cex = 0.9,
                 number.cex = 0.95,
                 addCoef.col = "black")

enter image description here

For example, I don't want the correlation between DQO and DBO to be calculated, as they are quality parameters. I also don't want the correlation between veg and dry to be calculated, as they are land use classes, for example.

Upvotes: 0

Views: 1339

Answers (1)

Ben
Ben

Reputation: 30474

You can calculate your Spearman correlations between land and water variables by selecting columns when calling cor. The function cor can alternatively accept a second matrix to compute correlations - so you can compute correlations between a "land use" matrix (columns 1-4) and a "water quality" matrix (columns 5-7):

my_cor <- cor(data_land[, 1:4], 
              data_land[, 5:7], 
              method = "spearman")

corrplot(my_cor,
         method = "color",
         tl.cex = 0.9,
         number.cex = 0.95,
         addCoef.col = "black")

Plot

correlation plot


If you instead want a square correlation plot, with empty spaces where you did not want to show correlations, you could try calculating correlations as you had before, set correlation results to NA for those you want to hide, and then set na.label to an empty space in corrplot:

my_cor <- cor(data_land, method = "spearman")

my_cor[1:4, 1:4] <- NA
my_cor[5:7, 5:7] <- NA

corrplot(my_cor,
         na.label = " ",
         method = "color",
         tl.cex = 0.9,
         number.cex = 0.95,
         addCoef.col = "black")

Upvotes: 3

Related Questions