Reputation: 15
I have a data.frame with 302 rows and 14 columns. The content of the data.frame is coefficients from 14 previous regression analyses and I am looking for a way to plot the entire data.frame, such that the coefficients are highlighted in shades of red and blue (negative and positive numbers respectively, 0's should be white).
The row- and columns names should not be shown in the plot, as well as the actual coefficients, but I would like the ability to add thicker lines at certain columns and rows. The data.frame is set up, such that rows and columns are grouped theoretically, so adding lines around these groupings would help underline this.
I have already tried with corrplot and ggplot. corrplot(df, is.corr = FALSE) gave me something related to what I want but the plot was way to long (due to the 302 rows). If possible, they (the rows) should automatically adjust their height, such that the whole plot is visible. My main goal is primarily to visually examine possible patterns of the colors.
Below is a snippet of my data.
ingen0 kommune3 kommune8 kommune9 diagnose1 diagnose2 diagnose7 diagnose12 diagose13 psyk5 psyk9 psyk10 krim4 krim6
abdominalomfang 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.002
adoption1 0.000 0.000 0.274 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
adoptions_anbr1 0.000 0.965 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.585
afsonforfods_mor1 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 -0.017
afsonforfodsfarr1 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.183
agteskab_laengde_far 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 -0.001 0.000 0.000 0.000 0.000 -0.008
agteskab_laengde_mor 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 -0.002
akutkejsfoed1 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 -0.127 0.000 0.000 0.000 0.000
alder_far -0.003 0.000 0.009 0.000 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.002 0.000 -0.001
alder_mor 0.000 0.000 0.004 0.000 0.000 -0.025 0.000 0.000 0.000 0.004 0.000 0.000 -0.007 -0.012
alm_lage_sysi_far -0.008 0.000 0.005 0.000 0.001 0.004 0.002 0.006 0.000 0.000 0.467 0.003 0.000 0.003
alm_lage_sysi_mor -0.007 0.000 0.009 0.003 0.000 0.006 0.003 0.006 -0.002 0.006 0.003 0.005 0.000 0.002
anbringelse1 -2.009 0.005 -1.696 -0.092 0.260 0.217 0.000 0.000 0.000 0.213 -0.092 -0.175 -0.392 0.169
anholdtforfods_far1 0.000 0.000 0.000 0.000 0.000 0.107 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.131
anholdtforfods_mor1 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 -0.214
antaldiag_far -0.006 0.000 0.019 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.051
antaldiag_mor 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
antdage_t_far 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
antdage_t_mor 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.001
apgarscore_efter5minutter 0.047 -0.091 -0.044 0.000 0.000 -0.027 0.000 -0.010 0.009 0.000 0.000 0.000 0.000 0.005
The following can be used to reproduce the corrplot. I have not managged to produce anything succesful in ggplot.
A <- structure(list(ingen0 = c(0, 0, 0, 0, 0, 0, 0, 0, -0.003, 0,
-0.008, -0.007, -2.009, 0, 0, -0.006, 0, 0, 0, 0.047), kommune3 = c(0,
0, 0.965, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.005, 0, 0, 0, 0, 0, 0,
-0.091), kommune8 = c(0, 0.274, 0, 0, 0, 0, 0, 0, 0.009, 0.004,
0.005, 0.009, -1.696, 0, 0, 0.019, 0, 0, 0, -0.044), kommune9 = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.003, -0.092, 0, 0, 0, 0, 0, 0,
0), diagnose1 = c(0, 0, 0, 0, 0, 0, 0, 0, 0.001, 0, 0.001, 0,
0.26, 0, 0, 0, 0, 0, 0, 0), diagnose2 = c(0, 0, 0, 0, 0, 0, 0,
0, 0, -0.025, 0.004, 0.006, 0.217, 0.107, 0, 0, 0, 0, 0, -0.027
), diagnose7 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.002, 0.003,
0, 0, 0, 0, 0, 0, 0, 0), diagnose12 = c(0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0.006, 0.006, 0, 0, 0, 0, 0, 0, 0, -0.01), diagose13 = c(0,
0, 0, 0, 0, -0.001, 0, 0, 0, 0, 0, -0.002, 0, 0, 0, 0, 0, 0,
0, 0.009), psyk5 = c(0, 0, 0, 0, 0, 0, 0, -0.127, 0, 0.004, 0,
0.006, 0.213, 0, 0, 0, 0, 0, 0, 0), psyk9 = c(0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0.467, 0.003, -0.092, 0, 0, 0, 0, 0, 0, 0), psyk10 = c(0,
0, 0, 0, 0, 0, 0, 0, 0.002, 0, 0.003, 0.005, -0.175, 0, 0, 0,
0, 0, 0, 0), krim4 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, -0.007, 0,
0, -0.392, 0, 0, 0, 0, 0, 0, 0), krim6 = c(0.002, 0, 0.585, -0.017,
0.183, -0.008, -0.002, 0, -0.001, -0.012, 0.003, 0.002, 0.169,
0.131, -0.214, 0.051, 0, 0, 0.001, 0.005)), row.names = c("abdominalomfang",
"adoption1", "adoptions_anbr1", "afsonforfods_mor1", "afsonforfodsfarr1",
"agteskab_laengde_far", "agteskab_laengde_mor", "akutkejsfoed1",
"alder_far", "alder_mor", "alm_lage_sysi_far", "alm_lage_sysi_mor",
"anbringelse1", "anholdtforfods_far1", "anholdtforfods_mor1",
"antaldiag_far", "antaldiag_mor", "antdage_t_far", "antdage_t_mor",
"apgarscore_efter5minutter"), class = "data.frame")
library(corrplot)
corrplot(A, is.corr = FALSE)
The issue with the above is as mentioned the amount (302) of rows in my original data.frame - it gets overcrowded, and I do not have the wanted possibilities to add lines fx - therefore I am looking for other options.
Upvotes: 0
Views: 78
Reputation: 66900
302 categories is a lot to display, especially if you want to see the categories. (We'd typically need ~10 pages to show that many lines of text.)
One approach could be to make an interactive plot where you hover to see categories:
First, some fake data:
library(tidyverse)
df <- data.frame(
category = rep(colors(), each = 14),
col = letters[1:14],
cor = rnorm(9198)
)
Then plotting as a ggplot tile grid:
ggplot(df, aes(col, category, fill = cor)) +
geom_tile() +
scale_fill_gradient2(low = "red", mid = "white", high = "blue")
plotly::ggplotly(.Last.value)
The categories on the left side are heavily overplotted and look rubbish (might be worth adding theme(axis.text.y = element_blank())
), but you can still explore interactively with plotly.
Upvotes: 2