Reputation: 7986
So my problem may not be suited for SO. But I am looking for a solution (in R, Python mainly, prefer R) to create heatmaps for data that has two extreme ends. Consider the following data.
+----+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+
| … | X1 | X2 | X3 | X4 | X5 | X6 | X7 | X8 | X9 | X10 | X11 | X12 |
+----+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+
| 1 | 0.960023745 | 0.006412462 | 0.002413886 | 1.75E-06 | 1.33E-07 | 6.53E-07 | 0.000789362 | 1.56E-07 | 0.027248026 | 2.54E-05 | 0.000108822 | 0.002949816 |
| 2 | 0.013783554 | 0.960582857 | 0.010711838 | 0.003933983 | 0.002573642 | 0.001472307 | 0.000319789 | 0.000195265 | 1.87E-05 | 1.29E-06 | 0.004194081 | 0.002209041 |
| 3 | 0.000839561 | 0.005466858 | 0.944159921 | 0.023892784 | 0.001752099 | 0.000828122 | 0.000493376 | 1.84E-06 | 0.011739846 | 0.000879784 | 9.53E-05 | 0.00980562 |
| 4 | 2.26E-08 | 0.004108291 | 0.010781282 | 0.966410413 | 0.010459999 | 3.04E-05 | 1.64E-06 | 0.001983494 | 0 | 0.000225223 | 0.002846474 | 0.0031448 |
| 5 | 0 | 0.003175902 | 0.002023363 | 0.010022482 | 0.919020424 | 0.032083951 | 0.001814906 | 0.030203657 | 2.02E-06 | 7.07E-05 | 0.001165208 | 0.000413012 |
| 6 | 7.34E-08 | 0.002817014 | 0.000931738 | 7.01E-05 | 0.026999736 | 0.947850807 | 0.003017895 | 0.017994113 | 0 | 0.00011791 | 0.000194055 | 0 |
| 7 | 0.001857195 | 0.000220267 | 0.001523402 | 1.23E-05 | 0.001915852 | 0.010193007 | 0.960227998 | 0.012040256 | 0.007093175 | 0.001441301 | 0.002149965 | 0.001306157 |
| 8 | 0 | 0.000337953 | 0 | 0.00536237 | 0.030409165 | 0.01670267 | 0.009929247 | 0.936720524 | 0 | 0 | 0.000503316 | 3.12E-05 |
| 9 | 0.00350741 | 2.38E-06 | 0.002294787 | 1.17E-06 | 9.38E-08 | 8.74E-08 | 0.000252812 | 4.25E-10 | 0.984092182 | 0.003173648 | 2.42E-05 | 0.006649569 |
| 10 | 0.000126558 | 4.85E-05 | 0.001686418 | 0.000202837 | 3.87E-05 | 9.82E-05 | 0.000425687 | 0 | 0.013116146 | 0.983428814 | 5.28E-05 | 0.000776452 |
| 11 | 0.000170592 | 0.002728779 | 0.000117028 | 0.002794149 | 0.000621607 | 0.000224662 | 0.000969203 | 0.000299963 | 0.000629235 | 4.68E-05 | 0.991344498 | 5.02E-05 |
| 12 | 0.004371355 | 0.001246307 | 0.02523568 | 0.007498292 | 0.000186287 | 6.00E-07 | 0.000956249 | 2.93E-05 | 0.0590514 | 0.001253133 | 8.40E-05 | 0.900059314 |
+----+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+
Consider the first row. The X1 column entry is a very high number compared to the rest of the entries in that row. This goes for all the rows. The heat map this data generates looks like the following
As you can see, the diagonal is very strong compared to the other colors (and this can be seen from the data and is actually expected). I am just trying to find a way to "darken" up the other colors. I'm mainly looking for a ggplot solution. Anything I've tried dosnt work.
The code for R right now is
heatmap(data.matrix(result_matrix), Rowv=NA, Colv=NA, col = rev(heat.colors(256)), margins=c(5,10))
Upvotes: 3
Views: 783
Reputation: 59335
The basic idea is to put the fill colors on a logarithmic scale. Here is a ggplot solution.
library(ggplot2)
library(reshape2)
df$id <- rownames(df)
gg <- melt(df,id="id")
ggplot(gg, aes(x=variable,y=id,fill=value))+
geom_tile()+
scale_fill_gradientn(colours=rev(heat.colors(10)),
trans="log10",na.value="white")+
coord_fixed()+
scale_x_discrete(expand=c(0,0))+scale_y_discrete(expand=c(0,0))
The key here is trans="log10"
in the call to scale_fill_gradientn(...)
. One problem with logs is that you have zeros in your data, which are transformed to NA
. Using na.value="white"
deals with that (you could make it another color if that was appropriate in your use case).
The calls to scale_x...
and scale_y...
are just to compress the axes so the tiles cover the whole plot (ggplot adds a bit of empty space by default which is distracting in heatmaps).
EDIT: Response ot OP's comment.
This business of "making the diagonal pop out more" is an aesthetic choice which has almost nothing to do with the data, and will probably lead to a misleading graphic. I do not recommend it. Having said that, you can always choose a different transformation.
# reorder the y-axis - should not be necessary
gg$id <- factor(gg$id,levels=unique(gg$id)) # should not be necessary...
# square root scale
ggplot(gg, aes(x=variable,y=id,fill=value))+
geom_tile()+
scale_fill_gradientn(colours=rev(heat.colors(10)),
trans="sqrt",na.value="white")+
coord_fixed()+
scale_x_discrete(expand=c(0,0))+scale_y_discrete(expand=c(0,0))
#logit scale; need to set breaks=... to avoid labels overlapping
ggplot(gg, aes(x=variable,y=id,fill=value))+
geom_tile()+
scale_fill_gradientn(colours=rev(heat.colors(10)),
trans="logit",na.value="white",breaks=5*10^-(0:8))+
coord_fixed()+
scale_x_discrete(expand=c(0,0))+scale_y_discrete(expand=c(0,0))
Upvotes: 2