alex
alex

Reputation: 1135

R - One dimensional "Heatmap" for categorial variables

Want to create a stack of 1D heatmaps which:

  1. show the centrality (e.g. mean, represented by the highlight)
  2. show the dispersion (e.g. standard deviation, represented by the grading)

Nota bene: The centrality or dispersion are not dependent on the sample sizes. The bar-length should be constant for every variable, the sample sizes are not (necessarily).

E.g. how it could look

enter image description here

Here a minimal example of similar variables:

library(plyr)

v1 <- c("yes", "rather no", "yes", "yes", "yes", "rather yes", "rather yes", "rather no", "rather no", "no", "no", "no")
(v1 <- factor(v1, levels=c("no", "rather no", "rather yes", "yes"), ordered = TRUE)) # order factor values & show
# now, one variant how to re-code/transform the _ordered_ factors as/to values
# (you may have a better proposal/oppinion)
(v1n <- sapply(v1, function(x) as.numeric(as.character(mapvalues(x, from=c("no", "rather no", "rather yes", "yes"), to=c("0", "0.333", "0.666", "1")))))) # re-code to numeric & show
(v1n.mean <- mean(v1n)) # calculate mean & show
(v1n.sd   <- sd(v1n))   # calculate standard deviation & show

v2 <- c("rather yes", "rather yes", "rather no", "rather no", "rather no", "rather no", "rather no", "rather no", "rather no")
v2 <- factor(v2, levels=c("no", "rather no", "rather yes", "yes"), ordered = TRUE)
v2
v2n <- sapply(v2, function(x) as.numeric(as.character(mapvalues(x, from=c("no", "rather no", "rather yes", "yes"), to=c("0", "0.333", "0.666", "1")))))
v2n
(v2n.mean <- mean(v2n))
(v2n.sd   <- sd(v2n))

v3 <- c("yes", "yes", "yes", "rather yes", "rather yes", "rather yes", "rather no", "no")
v3 <- factor(v3, levels=c("no", "rather no", "rather yes", "yes"), ordered = TRUE)
v3
v3n <- sapply(v3, function(x) as.numeric(as.character(mapvalues(x, from=c("no", "rather no", "rather yes", "yes"), to=c("0", "0.333", "0.666", "1")))))
v3n
(v3n.mean <- mean(v3n))
(v3n.sd   <- sd(v3n))

Upvotes: 1

Views: 665

Answers (1)

G5W
G5W

Reputation: 37631

Updated Answer:
This answer has been updated because
1. The data v1, v2, v3 in the question has been changed and
2. labels for the three bars have been added

The upper part is still mostly the original answer. Below is a newish answer to respond to clarification from the OP.

Original answer mostly
Here is something like what you are asking for. However, it cannot show a central tendency where none exists. After we look at the graphs, I will discuss that a bit more fully. After we look at the graphs, I will discuss that a bit more fully.

The idea is to make a blank plot and then draw a grayscale bar for each variable (v1, v2, v3). The place on the graph with the lowest number of responses will be black. The area with the most responses will be white. In between, the gray level will be scaled proportionally to the number of responses.

## To make it easy to refer to the different variables
Responses = list(v1,v2,v3)

## 100 colors to allow for a lot of continuity
## color 1 is black, color 100 is white
GrayScale = gray.colors(100, start=0.05, end=0.97)

## Make a blank plot
plot(NULL, type="n", xlab="", ylab="", bty="n", xaxt="n", yaxt="n",
    xlim=c(1,4), ylim=c(1,length(Responses)+1))

## Plot all of the bars
for(j in 1:length(Responses)) {
    Tab = table(Responses[[j]])
    Tab = round(99*(Tab-min(Tab))/(max(Tab)-min(Tab)))+1
    x = seq(1,4,0.01)
    Density = round(approx(1:4, Tab , x)$y)

    ## Make a smooth looking bar
    for(i in 1:(length(x)-1))  {
        polygon(c(x[i],x[i],x[i+1],x[i+1]), c(j,j+0.75,j+0.75,j), 
            col=GrayScale[Density[i]], border=NA)
    }
}
## Add labels
text(1:4, 4, levels(v1))
axis(2, at=(1:3)+0.4, labels=c("v1", "v2", "v3"), lwd=0, lwd.ticks=1, las=1)

Barplot based on distribution of data

Answer to modified question
This answer just plots Gaussian distributions using the means and standard deviations that you calculated. The Gaussians are plotted in the style of the previous answer, with white for the mean and the point most distant from the mean is black.

Means = c(v1n.mean, v2n.mean, v3n.mean)
SD    = c(v1n.sd, v2n.sd, v3n.sd)

## 100 colors to allow for a lot of continuity
## color 1 is black, color 100 is white
GrayScale = gray.colors(100, start=0.05, end=0.97)

## Make a blank plot
plot(NULL, type="n", xlab="", ylab="", bty="n", xaxt="n", yaxt="n",
    xlim=c(1,4), ylim=c(1,length(Responses)+1))

for(j in 1:length(Responses)) {
    x = seq(1,4,0.03)
    y = dnorm((x-1)/3, Means[j], SD[j])
    y = round(99*(y-min(y))/(max(y)-min(y))) + 1

    for(i in 1:(length(x)-1))  {
        polygon(c(x[i],x[i],x[i+1],x[i+1]), c(j,j+0.75,j+0.75,j), 
            col=GrayScale[y[i]], border=NA)
    }
}
## Add labels
text(1:4, 4, levels(v1))
axis(2, at=(1:3)+0.4, labels=c("v1", "v2", "v3"), lwd=0, lwd.ticks=1, las=1)

Barplot from Gaussians

Upvotes: 1

Related Questions