R: Correlation matrix between multiple rows (objects) over multiple columns (variables)

Question

I'm dealing with a dataframe of multiple rows (objects) over multiple columns (variables). I want to see if any rows (objects) are correlated. I've been through reading corr() and it seems for one variable, I can transpose my dataframe and feed it into the corr() function. but how do I deal with multiple variables of each observation/object? The end goal, plot the correlation matrix on a heatmap to eyeball interesting objects.

Examples as below:

Treatment <- c('Drug A','Drug B','Drug C','Drug D','Drug E','Drug F')
Measurment_V1 <- runif(6, 0, 3000)
Measurment_V2 <- runif(6, 0, 20)
Measurment_V3 <- runif(6, 0, 1)
Measurment_V4 <- runif(6, 0, 120000)
Measurment_V5 <- runif(6, 0, 100)

df<- as.data.frame(cbind(Treatment,Measurment_V1,Measurment_V2,Measurment_V3,Measurment_V4,Measurment_V5))

Each drug is explained by measurments V1-V5 (in realit there are a few hundreds columns) So how can get a correlation matrix between all the drugs ABCD then plot their correlation on a heatmap like the Hmisc library could do?

Werner Hertzog · Accepted Answer

This might do it:

# Redo your data frame
df <- data.frame(Treatment,Measurment_V1,Measurment_V2,Measurment_V3,Measurment_V4,Measurment_V5)

# Transpose numeric columns
dft <- as.data.frame(t(df[,2:6]))

# Rename vars
names(dft) <- c("Drug_A","Drug_B","Drug_C","Drug_D","Drug_E","Drug_F")

# Correlation matrix
cor(dft)


Output:
          Drug_A    Drug_B    Drug_C    Drug_D    Drug_E    Drug_F
Drug_A 1.0000000 0.9995697 0.9999240 0.9999939 0.9998902 0.9999665
Drug_B 0.9995697 1.0000000 0.9998554 0.9994612 0.9998946 0.9997758
Drug_C 0.9999240 0.9998554 1.0000000 0.9998748 0.9999969 0.9999911
Drug_D 0.9999939 0.9994612 0.9998748 1.0000000 0.9998324 0.9999320
Drug_E 0.9998902 0.9998946 0.9999969 0.9998324 1.0000000 0.9999777
Drug_F 0.9999665 0.9997758 0.9999911 0.9999320 0.9999777 1.0000000

You can then use the above correlation matrix to plot a heatmap.

Notice that I used data.frame() to redo your data frame since it makes numeric columns.

R: Correlation matrix between multiple rows (objects) over multiple columns (variables)

Answers (2)

Related Questions