F612
F612

Reputation: 566

Julia DataFrame of correlation matrix, how to extract high correlated cell values and columns?

New to Julia. I'm working on a correlation matrix. I've converted it into a dataframe to include feature names. To find which features are highly correlated, I need names of the features and its value. I get the value using the following:

corr_matrix_df=cor(Matrix(df))

idx_hcorr=findall(x->abs.(x)>0.6, corr_matrix_df)

But I dont know how to get column names. If I short it columnwise, the feature rows will shuffle up incorrectly. Any ideas?

Upvotes: 2

Views: 958

Answers (1)

Bogumił Kamiński
Bogumił Kamiński

Reputation: 69949

Here is how you can do it:

julia> using DataFrames, Random

julia> Random.seed!(1234)
MersenneTwister(1234)

julia> df = DataFrame(rand(5, 5), :auto)
5×5 DataFrame
 Row │ x1        x2        x3         x4         x5
     │ Float64   Float64   Float64    Float64    Float64
─────┼────────────────────────────────────────────────────
   1 │ 0.590845  0.854147  0.648882   0.112486   0.950498
   2 │ 0.766797  0.200586  0.0109059  0.276021   0.96467
   3 │ 0.566237  0.298614  0.066423   0.651664   0.945775
   4 │ 0.460085  0.246837  0.956753   0.0566425  0.789904
   5 │ 0.794026  0.579672  0.646691   0.842714   0.82116

julia> using Statistics

julia> cm = cor(Matrix(df))
5×5 Matrix{Float64}:
  1.0       0.101686    -0.420953   0.562488     0.2127
  0.101686  1.0          0.378276   0.00772785   0.100182
 -0.420953  0.378276     1.0       -0.327604    -0.791489
  0.562488  0.00772785  -0.327604   1.0         -0.0746962
  0.2127    0.100182    -0.791489  -0.0746962    1.0

julia> high = findall(x -> abs(x) > 0.6, cm)
7-element Vector{CartesianIndex{2}}:
 CartesianIndex(1, 1)
 CartesianIndex(2, 2)
 CartesianIndex(3, 3)
 CartesianIndex(5, 3)
 CartesianIndex(4, 4)
 CartesianIndex(3, 5)
 CartesianIndex(5, 5)

julia> [[names(df, idx.I[1]); names(df, idx.I[2])] for idx in high]
7-element Vector{Vector{String}}:
 ["x1", "x1"]
 ["x2", "x2"]
 ["x3", "x3"]
 ["x5", "x3"]
 ["x4", "x4"]
 ["x3", "x5"]
 ["x5", "x5"]

is this what you wanted? (I added one step after your last step)

Upvotes: 2

Related Questions