chazmatazz
chazmatazz

Reputation: 133

Why does this correlation matrix not rearrange (corrr)?

Using corrr to produce a Pearson correlation matrix, I get a nice data frame and can rearrange to have a organised-looking matrix. However, when I plot this with rplot, the rearrangement seems to be thrown out.

Here is a subset of the correlation data frame, with the matrix run:

data <- select(data,c(npqmax,npq_end,npq_slope_up,pi,npqmax,fvfm,phipsii_end))

> data
# A tibble: 861 x 6
   npqmax npq_end npq_slope_up    pi  fvfm phipsii_end
    <dbl>   <dbl>        <dbl> <dbl> <dbl>       <dbl>
 1   2.60   0.866         1.25 0.805 0.745       0.492
 2   2.92   1.02          1.27 0.801 0.753       0.485
 3   2.95   0.881         1.33 0.832 0.752       0.518
 4   2.56   0.846         1.34 0.811 0.736       0.488
 5   2.68   0.822         1.52 0.820 0.738       0.499
 6   2.58   0.876         1.32 0.809 0.740       0.486
 7   2.82   0.908         1.14 0.824 0.749       0.505
 8   2.93   0.997         1.29 0.803 0.749       0.476
 9   2.71   0.936         1.51 0.819 0.740       0.490
10   2.80   0.844         1.40 0.837 0.754       0.527
# ... with 851 more rows

### next run Pearson correlation

cormat <- correlate(data)

> cormat
# A tibble: 6 x 7
  rowname       npqmax npq_end npq_slope_up      pi    fvfm phipsii_end
  <chr>          <dbl>   <dbl>        <dbl>   <dbl>   <dbl>       <dbl>
1 npqmax       NA       0.240        0.0103  0.0820  0.249       0.0582
2 npq_end       0.240  NA            0.193  -0.716  -0.0492     -0.729 
3 npq_slope_up  0.0103  0.193       NA      -0.167  -0.293      -0.261 
4 pi            0.0820 -0.716       -0.167  NA       0.383       0.918 
5 fvfm          0.249  -0.0492      -0.293   0.383  NA           0.614 
6 phipsii_end   0.0582 -0.729       -0.261   0.918   0.614      NA   

### make a nice rearrangement 

cormat2 %>%
  rearrange(method = "MDS", absolute = FALSE) %>% 
  shave()

> cormat2
# A tibble: 6 x 7
  rowname      npq_end npq_slope_up  npqmax   fvfm     pi phipsii_end
  <chr>          <dbl>        <dbl>   <dbl>  <dbl>  <dbl>       <dbl>
1 npq_end      NA           NA      NA      NA     NA              NA
2 npq_slope_up  0.193       NA      NA      NA     NA              NA
3 npqmax        0.240        0.0103 NA      NA     NA              NA
4 fvfm         -0.0492      -0.293   0.249  NA     NA              NA
5 pi           -0.716       -0.167   0.0820  0.383 NA              NA
6 phipsii_end  -0.729       -0.261   0.0582  0.614  0.918          NA

Now I would plot this with rplot(shape = 15, colors = c("red", "green")) but instead of getting a plot like what's found on the corrr creator's blog:

example heatmap ()

I get something that looks not-so-arranged:

[Full dataset matrix plot[1]

Any idea what's going wrong?

Thanks.

Upvotes: 2

Views: 240

Answers (1)

StupidWolf
StupidWolf

Reputation: 46888

I used mtcars which is also the example in corrr's blog, and I get the same results:

library(corrr)
library(dplyr)
library(ggplot2)

cormat <- correlate(mtcars)
cormat2 <- cormat %>%
rearrange(method = "MDS", absolute = FALSE) %>% 
shave()

cormat2 %>% rplot(shape = 15, colors = c("red", "green"))

enter image description here

If you look at your matrix data.frame, the values that are NA are appearing now, which means the rows orders are messed up. Should be reflected to the author, below I make a few alterations to corrr:::rplot.cor_df :

newplot = function (rdf, legend = TRUE, shape = 16, colours = c("indianred2", 
    "white", "skyblue1"), print_cor = FALSE, colors) 
{
    if (!missing(colors)) 
        colours <- colors
    row_order <- rdf$rowname
    pd <- stretch(rdf, na.rm = TRUE)
    pd$x <- factor(pd$x,levels=row_order)
    pd$y <- factor(pd$y,levels=rev(row_order))
    pd$size = abs(pd$r)
    pd$label = fashion(pd$r)
    plot_ <- list(geom_point(shape = shape), if (print_cor) geom_text(color = "black", 
        size = 3, show.legend = FALSE), scale_colour_gradientn(limits = c(-1, 
        1), colors = colours), theme_classic(), labs(x = "", 
        y = ""), guides(size = "none", alpha = "none"), if (legend) labs(colour = NULL), 
        if (!legend) theme(legend.position = "none"))
    ggplot(pd, aes_string(x = "x", y = "y", color = "r", size = "size", 
        alpha = "size", label = "label")) + plot_
}

newplot(cormat2,shape=15,colours=c("#29c7ac","#c02739"))

enter image description here

Quick explanation, in the above function, there is a line stretch(rdf, na.rm = TRUE) where the correlation data.frame is melted but the order of your variables are not retained. I just added two lines to refactor them, there are other ways, but for your purpose, this should be ok.

Upvotes: 2

Related Questions