Emma
Emma

Reputation: 33

Creating a heatmap from Mortality Data?

I have two formats of my Mortality Data, one in the list form you get it from The Human Mortality Database, with Male, Female and Combined data all in columns. The other format is separated into Male and Female matrices, with just Age, year and the mortality rate in each matrix.

The first format is along the lines of

Year Age   Female     Male    Total  
1961  99     0.3       0.4     0.3  
1961  98     0.4       0.5     0.4  

etc.

The second format I separated to get data in the form of:

 Age 1961  1962  1963 .....  
  0  0.02  0.02  0.02 ...  
  1  0.002 0.002 0.002....  

etc.

I would like to be able to plot a heatmap so I can look at the cohort effects etc.

I have tried various methods found by searching online but these aren't working for the way my data is presented. The heatmaps I've produced come out completely red. Can anyone help?

I've tried this:

rnames <- France[,1]   #assign labels in column 1 to "rnames"
mat_data <- data.matrix(France[,2:ncol(France)])
rownames(mat_data) <- rnames #assign row names
col_breaks = c(seq(-1,0,length=100),  # for red
  seq(0,0.8,length=100),              # for yellow
  seq(0.8,1,length=100))              # for green
my_palette <- colorRampPalette(c("red", "yellow", "green"))(n = 299)
png("location",    # create PNG for the heat map        
  width = 5*300,        # 5 x 300 pixels
  height = 5*300,
  res = 300,            # 300 pixels per inch
  pointsize = 8)        # smaller font size

heatmap.2(mat_data,
cellnote=mat_data,
main="Correlation",
notecol="black",
trace="none",
margins =c(12,9),
col=my_palette,
breaks=col_breaks,
dendrogram="row",
Colv="NA")
dev.off()

Which creates a solid red heatmap, with the year listed along the bottom, and then the word Age next to the years, and then the actual ages listed along the y-axis. It also gives me an error code:

Error in seq.default(min.raw, max.raw, by = min(diff(breaks)/4)) : 
invalid (to - from)/by in seq(.)

Does anyone know of a better way of producing the heatmap or what I've done wrong here?

Upvotes: 1

Views: 621

Answers (2)

Heroka
Heroka

Reputation: 13149

Is this in any way helpful? I based it on what your data looks like, and generated some data to match. Then I started with a plot with 'year' on the x-axis and 'age' on the y-axis and a square (geom_tile) for each point. Those squares are coloured according to the 'total'. It doesn't have any polygons like the example you gave, but I think with your real data it would enable you to look for cohort effects.

#generate some data ranging from 0 to 0.1
set.seed(1000)
France <- expand.grid(Year=1961:2000,Age=20:98)
France$Female <- runif(nrow(France),0,0.05)
France$Male <- runif(nrow(France),0,0.05)
France$Total <- France$Male + France$Female


library(ggplot2)

p1 <- ggplot(France, aes(x=Year,y=Age,fill=Total)) + 
  geom_tile()+ 
  scale_fill_gradientn(colours=rainbow(10))
p1

enter image description here

Upvotes: 1

TayTay
TayTay

Reputation: 7170

From the source code:

z <- seq(min.raw, max.raw, by=min(diff(breaks)/4))

The heatmap.2 code is internally calling the seq function and produces the error you're experiencing:

Error in seq.default(min.raw, max.raw, by = min(diff(breaks)/4)) : 
    invalid (to - from)/by in seq(.)

What are min.raw and max.raw, though? Scroll up a bit (line 640) and you'll see they are the min and max of the breaks arg you passed in (which in this case is -1 and 1 respectively). The by parameter in the internal seq function evaluates to 0:

min(diff(breaks)/4)

In fact, you can replicate this error if you try to construct a seq function with these parameters:

> seq(-1, 1, by=0)
Error in seq.default(-1, 1, by = 0) : invalid (to - from)/by in seq(.)

There are two implications here: first of all, you've uncovered a cornercase that breaks that code and this is a bug that should probably be reported on the github repository (i.e., if this evaluates to 0, use some pre-defined by param). Secondly, you could use a uniform break parameter or just not define it. It is, afterall, an optional parameter. From the documentation:

breaks
(optional) Either a numeric vector indicating the splitting points for binning x
into colors, or a integer number of break points to be used, in which case the break
points will be spaced equally between min(x) and max(x).

By leaving breaks blank or providing a single value, you shouldn't encounter this problem.

Upvotes: 1

Related Questions