Reputation: 25
I'm new to R and I would appreciate your help. I have a 3 columns df that looks like this:
> head(data)
V.hit J.hit frequency
1 IGHV1-62-3*00 IGHJ2*00 0.51937442
2 IGHV5-17*00 IGHJ3*00 0.18853542
3 IGHV3-5*00 IGHJ1*00 0.09777304
4 IGHV2-9*00 IGHJ3*00 0.03040866
5 IGHV5-12*00 IGHJ4*00 0.02900040
6 IGHV5-12*00 IGHJ2*00 0.00910554
This is just part of the data for example. I want to create a Heat map so that the X-axis will be "V.hit" and the Y-axis will be "J.hit", and the values of the heatmap will be the frequency (im interested of the freq for each combination of V+j). I tried to use this code for the interpolation:
library(akima)
newData <- with(data, interp(x = `V hit`, y = `J hit`, z = frequency))
but I'm getting this error:
Error in interp.old(x, y, z, xo, yo, ncp = 0, extrap = FALSE, duplicate = duplicate, :
missing values and Infs not allowed
so I don't know how to deal with it. I want to achieve this final output:
> head(fld)
# A tibble: 6 x 5
...1 `IGHJ1*00` `IGHJ2*00` `IGHJ3*00` `IGHJ4*00`
<chr> <dbl> <dbl> <dbl> <dbl>
1 IGHV10-1*00 0.00233 0.00192 NA 0.000512
2 IGHV1-14*00 NA NA 0.00104 NA
3 IGHV1-18*00 NA 0.000914 NA NA
4 IGHV1-18*00 NA NA 0.000131 NA
5 IGHV1-19*00 0.0000131 NA NA NA
6 IGHV1-26*00 NA 0.000214 NA NA
while cells that are "NA" will be assigned as "0". And then I'm assuming I will be able to use the heatmap function to create my heat map graph. any help would be really appreciated!
Upvotes: 1
Views: 2513
Reputation: 1196
You can interpolate with a linear model if the variables correlate.
mdl <- lm(z ~ ., df)
out <- NULL
for(x in seq(min(df$x), max(df$x), (max(df$x) - min(df$x)/100) )){
tmp <- c()
for(y in seq(min(df$y), max(df$y), (max(df$y) - min(df$y)/100) )){
h <- predict(
mdl,
data.frame(x = x, y = y)
)
tmp = c(tmp, h)
}
if(is.null(out)){
out = as.matrix(tmp)
}else{
out = cbind(out, tmp)
}
}
fig <- plot_ly(z = out, colorscale = "Hot", type = "heatmap")
fig <- fig %>% layout(
title = "Interpolated Heatmap of Z Given x, y",
xaxis = list(
title = "x"
),
yaxis = list(
title = "y"
)
)
fig
Upvotes: 0
Reputation: 72593
In base R we could adapt @GregSnow's solution for a correlation matrix to a frequency heatmap.
First, we cut
the vector, say into quartiles (the default in quantile
) and get factor values.
dat$freq.fac <- cut(dat$frequency, quantile(dat$frequency, na.rm=TRUE), include.lowest=T)
Second to prepare the colors, we just copy the factor column and relevel them with builtin heat.colors
and a white color for the zero values.
dat <- within(dat, {
freq.col <- freq.fac
levels(freq.col) <- c(heat.colors(length(levels(dat$freq.fac)), rev=T), "#FFFFFF")
})
Third, apply white color to NA
s or zero value respectively.
dat$freq.col[is.na(dat$freq.col)] <- "#FFFFFF"
dat$frequency[is.na(dat$frequency)] <- 0
Fourth, apply xtabs
and create a color matrix and match colors and levels after.
dat.x <- xtabs(frequency ~ v.hit + j.hit, dat)
col.m <- matrix(dat$freq.col[match(dat$frequency, as.vector(dat.x))], nrow=nrow(dat.x))
Finally plot using rasterImage
function.
op <- par(mar=c(.5, 4, 4, 3)+.1) ## adapt outer margins
plot.new()
plot.window(xlim=c(0, 5), ylim=c(0, 5))
rasterImage(col.m, 0, 1, 5, 5, interpolate=FALSE)
rect(0, 1, 5, 5) ## frame it with a box
## numbers in the cells
text(col(round(dat.x, 3)) - .5, 5.45 - row(round(dat.x, 3))*.8, round(dat.x, 3))
mtext("Frequency heatmap", 3, 2, font=2, cex=1.2) ## title
mtext(rownames(dat.x), 2, at=5.45 -(1:5)*.8, las=2) ## y-axis
mtext(colnames(dat.x), 3, at=(1:5)-.5) ## y-axis (upper)
## a legend
legend(-.15, .75, legend=c("Frequency:\t", 0, paste("<", seq(.25, 1, .25))), horiz=TRUE,
pch=c(NA, rep(22, 5)), col=1, pt.bg=c(NA, levels(dat$freq.col)[c(5, 1:4)]),
bty="n", xpd=TRUE, cex=.75, text.font=2)
par(op) ## reset margins
Toy data:
dat <- structure(list(v.hit = structure(c(1L, 2L, 3L, 4L, 5L, 1L, 2L,
3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L,
4L, 5L), .Label = c("A", "B", "C", "D", "E"), class = "factor"),
j.hit = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L
), .Label = c("F", "G", "H", "I", "J"), class = "factor"),
frequency = c(NA, NA, 0.717618508264422, NA, NA, 0.777445221319795,
NA, 0.212142521282658, 0.651673766085878, 0.125555095961317,
NA, 0.386114092543721, 0.0133903331588954, NA, 0.86969084572047,
0.34034899668768, 0.482080115471035, NA, 0.493541307048872,
0.186217601411045, 0.827373318606988, NA, 0.79423986072652,
0.107943625887856, NA)), row.names = c(NA, -25L), class = "data.frame")
Upvotes: 0
Reputation: 23574
Here is an idea using geom_tile()
. Your data is called foo
. I created all possible combination of V.hit and J.hit using complete()
. For missing values, I asked complete()
to use 0
to fill. Then, I used geom_tile() to produce the following graphic. You may want to consider the order of levels, if neccessary.
library(tidyverse)
complete(foo, V.hit, nesting(J.hit), fill = list(frequency = 0)) %>%
ggplot(aes(x = J.hit, y = V.hit, fill = frequency)) +
geom_tile()
Upvotes: 2