Reputation: 5
I am putting together a workflow in R that will ultimately be used to assist in migrating a series of very large databases that are similar, but frustratingly different in minor ways.
One of the things I need to be able to visualise is which variable names are present in each database, and what datatype they are.
I have reached the point where I have a summary dataframe that looks very similar to the example below.
category <- c("Location", "Date", "Time", "Number")
species1 <- c("character", "character", "character", "integer")
species2 <- c("integer", "integer", NA, "character")
species3 <- c("character", "posix", "posix", "integer")
species4 <- c(NA, NA, "posix", "integer")
comparison_table <- data.frame(category, species1, species2, species3, species4)
The NA
values denote that this variable is not present within a specific database.
My ultimate goal was to construct a plot of coloured squares to easily identify inconsistent datatypes between the databases (for example, where dates have been recorded as integers instead of POSIX, or where latitude recorded as a character instead of an integer).
My gut tells me that the geom_raster
in ggplot2
should be the simplest way to achieve this, but I keep coming up short. I know that I need to define the fill
in the aesthetic, but every attempt is met with a different error.
comparison_table %>%
ggplot(aes(x = colnames(comparison_table), y = rownames(comparison_table))) +
geom_raster()
A fresh pair of eyes and a less tired brain would be deeply appreciated.
Upvotes: 0
Views: 311
Reputation: 19097
You'll need to re-structure your data to fit in the grammar of ggplot
.
In aes(x, y, fill)
, we should supply a column from data
, which essentially tells which variable should the subsequent geom_function()
use to display the data.
In your case, you want:
Category
column.species
, where species1, species2, species3, species4 are grouped in this column, with it's corresponding value grouped into the type
column.geom_raster()
, you should also tell ggplot
to use which variable to fill
the squares (remember to use fill
inside aes()
if your fill
comes from a column).library(tidyverse)
comparison_table %>% pivot_longer(!Category, names_to = "species", values_to = "type") %>%
ggplot(aes(x = Category, y = species, fill = type)) +
geom_raster()
Upvotes: 1