Reputation: 101
I am starting over using R and ggplot to visualize time series data of environmental variables. So far I love the oppurtnities of ggplot2 to visualize the data, easily choosing different periods and variables to plot and define aesthetics. But now I have encountered the first problem that I wasn´t really able to google:
I have 8 dataframes ("TreeA
" - "TreeH
") structured like following, where TreeA
is the Name of the data frame, "Time" is the time of measurement, formatted in POSIXct
format, and Tleaf
, Tair
and Tdiff
are three of 16 variables:
TreeA
Zeit Tleaf Tair Tdiff ........
1: 2018-05-18 00:00:00 12.997 13.20000 -0.203
2: 2018-05-18 00:10:00 13.082 13.20000 -0.119
3: 2018-05-18 00:20:00 11.909 12.06700 -0.158
4: 2018-05-18 00:30:00 11.315 11.53300 -0.219
5: 2018-05-18 00:40:00 11.251 11.46700 -0.216
I have melt
ed the DFs to long format resulting
TreeA_long
Time variable value
1: 2018-05-18 00:00:00 Tleaf 12.997000000
2: 2018-05-18 00:10:00 Tleaf 13.082000000
3: 2018-05-18 00:20:00 Tair 11.909
4: 2018-05-18 00:30:00 Tair 11.315
5: 2018-05-18 00:40:00 Tdiff 1.251
From this I have been successfully plotting Graphs with this ggplot functionalities:
ggplot(subset(TreeA_long, variable %in% c("Tleaf","Tair","Tdiff")),
aes(x=Time,
y=value, color=variable)) +
geom_line() +
scale_x_datetime(limits=start.endKW21, labels = date_format("%d") , breaks = date_breaks("24 hours")) +
scale_y_continuous(limits = c(5,55),breaks = seq(10,55, by = 2)) +
labs(title="Mai/Juni Cbet1", x="Day", y="Temperature") +
theme(legend.position='right') +
scale_color_manual(values = c("Tleaf" = "green", "Tair" = "blue", "Tdiff" = "yellow"))
I have tried to add a second geom_line(data=TreeB_long)
for plotting variables from the second Dataframe in the same plot. It has worked to plot all the variables from TreeB
but of course I need to compare same variables and also I want to specify aesthetics (color of the lines, dashing lines etc. for each variable.
So my question is:
TreeA
to TreeB
in one Plot? I hope that my questions are clear enough, and you can help me somehow. I believe that there is an easy solution to my problem, but as I said googling didn´t yield good results so far.
Thank you and have a good day! Konrad
Upvotes: 0
Views: 1151
Reputation: 101
So according to Mikko Marttila´s proposal I was binding together all (already loaded 8 Dataframes (treeA, ..., treeF) to one using tibble::lst
and dplyr::bind_rows
, resulting a new DF:
Liste <- lst (treeA,treeB,treeC,treeD,treeE,treeG,treeH)
new <- bind_rows(Liste, .id="Test")
> Test Time Tleaf Tair ....
> 1: treeA 2018-05-18 00:00:00 12.997 13.20000
> 2: treeA 2018-05-18 00:10:00 13.082 13.20000
> 3: treeA 2018-05-18 00:20:00 11.909 12.06700
.....
> 300: treeH 2018-05-18 00:30:00 11.315 11.53300
> 301: treeH 2018-05-18 00:40:00 11.251 11.46700
After this using reshape2::melt
with defining two columns as id.Vars yields a long Dataframe with 4 columns
long <-melt(new, id.vars = c("Time", "Test"))
long
Time Test variable value
1: 2018-05-18 00:00:00 treeA Tleaf 12.997000000
2: 2018-05-18 00:10:00 treeA Tleaf 13.082000000
3: 2018-05-18 00:20:00 treeA Tleaf 11.909000000
...
300: 2018-05-18 00:30:00 treeH Tleaf 11.315000000
301: 2018-05-18 00:40:00 treeH Tleaf 11.251000000
finally combining the Columns Zeit
and Test
by tidyr::unite
yields a long format Dataframe including all my Data from the 8 input Dataframes:
long2 <- unite(long, variable, c(Test, variable), remove=TRUE)
long2
Zeit variable value
1: 2018-05-18 00:00:00 treeA_Tleaf 12.997000000
2: 2018-05-18 00:10:00 treeA_Tleaf 13.082000000
3: 2018-05-18 00:20:00 treeA_Tleaf 11.909000000
...
300: 2018-05-18 00:30:00 treeH_Tleaf 11.315000000
301: 2018-05-18 00:40:00 treeH_Tleaf 11.251000000
Having this is all that I need to work with ggplot2 being able to identify and load values for plotting from the different sources. If there is easier ways to achieve this let me know in the comments. also I think there might be solutions using more functions of the base package. But since I need to get things done I don´t mind loading a lot of packages. Note that the Data pasted here is to visualize the structure.
Upvotes: 0
Reputation: 11878
I think you should probably append the treeA-treeH datasets, including an indicator variable for the name of the data (e.g. dplyr::bind_rows(tibble::lst(treeA, treeB, <...>, treeH), .id = "data")
), then melt()
and use the dataset indicator variable to construct your plot.
Here's a simplified example. First, let's read in the data that you give:
txt <- "Date Time Tleaf Tair Tdiff
2018-05-18 00:00:00 12.997 13.20000 -0.203
2018-05-18 00:10:00 13.082 13.20000 -0.119
2018-05-18 00:20:00 11.909 12.06700 -0.158
2018-05-18 00:30:00 11.315 11.53300 -0.219
2018-05-18 00:40:00 11.251 11.46700 -0.216"
treeA <- read.table(text = txt, header = TRUE,
stringsAsFactors = FALSE)
For the sake of the example, I'm also creating a
treeB
dataset by just adding some noise to treeA
:
library(dplyr)
library(ggplot2)
set.seed(1)
n <- nrow(treeA)
treeB <- treeA %>%
mutate_if(is.numeric, function(x) x + rnorm(n))
We can now append the two datasets with bind_rows()
and
add a variable to show the original data frame.
tree <- tibble::lst(treeA, treeB) %>%
bind_rows(.id = "data") %>%
mutate(dttm = as.POSIXct(paste(Date, Time)))
Before plotting, it's useful to reshape the data to long form, as you have done before:
tree_long <- reshape2::melt(tree, measure = c("Tleaf", "Tair", "Tdiff"))
Now we are ready to plot. The choice of the layout you want to use will of
course depend on what aspect of the data you want to emphasize; for example,
if the comparison between different tree
datasets is of interest, it might
be a good idea to use facetting to compare the tree
s within each variable:
ggplot(tree_long, aes(dttm, value, color = data)) +
facet_wrap(~ variable, scales = "free_y", ncol = 1) +
geom_line()
Created on 2018-07-09 by the reprex package (v0.2.0.9000).
Upvotes: 0