Hierarchical clustering on continuous heterogeneous variables with different range/scales in R

Question

I would like to use R to perform hierarchical clustering with two groups of variables describing the same samples. One group is microarray gene expression data (for specific genes) that have been normalized and batch effect corrected. The other group also has some quantitative clinical parameters that describe the same samples. However, these clinical variables have not been normalized or subjected to any kind of transformation(i.e. raw continuous values).

For example, one variable of these could have range of values from 2 to 35, whereas another from 0.1 to 0.9, etc.

Thus, as my ultimate goal in to implement hierarchical clustering and use both groups simultaneously (merged in a matrix/dataframe), in order to inspect which of these clinical variables cluster with specific genes, etc:

1) Is an initial transformation in the group of the clinical variables necessary before merging with the genes and perform the clustering ? For example: log2 transformation, which has also been done to part of my gene expression data !!

2) Or, a row scaling (that is the total features in the input data) would take into account this discrepancy ?

3) For a similar analysis/approach, like constructing a correlation plot of the above total variables, would a simple scaling be sufficient?

Hierarchical clustering on continuous heterogeneous variables with different range/scales in R

Answers (1)

Related Questions