Reputation: 566
New to Julia. I'm working on a correlation matrix. I've converted it into a dataframe to include feature names. To find which features are highly correlated, I need names of the features and its value. I get the value using the following:
corr_matrix_df=cor(Matrix(df))
idx_hcorr=findall(x->abs.(x)>0.6, corr_matrix_df)
But I dont know how to get column names. If I short it columnwise, the feature rows will shuffle up incorrectly. Any ideas?
Upvotes: 2
Views: 958
Reputation: 69949
Here is how you can do it:
julia> using DataFrames, Random
julia> Random.seed!(1234)
MersenneTwister(1234)
julia> df = DataFrame(rand(5, 5), :auto)
5×5 DataFrame
Row │ x1 x2 x3 x4 x5
│ Float64 Float64 Float64 Float64 Float64
─────┼────────────────────────────────────────────────────
1 │ 0.590845 0.854147 0.648882 0.112486 0.950498
2 │ 0.766797 0.200586 0.0109059 0.276021 0.96467
3 │ 0.566237 0.298614 0.066423 0.651664 0.945775
4 │ 0.460085 0.246837 0.956753 0.0566425 0.789904
5 │ 0.794026 0.579672 0.646691 0.842714 0.82116
julia> using Statistics
julia> cm = cor(Matrix(df))
5×5 Matrix{Float64}:
1.0 0.101686 -0.420953 0.562488 0.2127
0.101686 1.0 0.378276 0.00772785 0.100182
-0.420953 0.378276 1.0 -0.327604 -0.791489
0.562488 0.00772785 -0.327604 1.0 -0.0746962
0.2127 0.100182 -0.791489 -0.0746962 1.0
julia> high = findall(x -> abs(x) > 0.6, cm)
7-element Vector{CartesianIndex{2}}:
CartesianIndex(1, 1)
CartesianIndex(2, 2)
CartesianIndex(3, 3)
CartesianIndex(5, 3)
CartesianIndex(4, 4)
CartesianIndex(3, 5)
CartesianIndex(5, 5)
julia> [[names(df, idx.I[1]); names(df, idx.I[2])] for idx in high]
7-element Vector{Vector{String}}:
["x1", "x1"]
["x2", "x2"]
["x3", "x3"]
["x5", "x3"]
["x4", "x4"]
["x3", "x5"]
["x5", "x5"]
is this what you wanted? (I added one step after your last step)
Upvotes: 2