Konrad Bauer
Konrad Bauer

Reputation: 101

Plotting multiple variables with the same name from different dataframes in a time series; ggplot2 tidyr

I am starting over using R and ggplot to visualize time series data of environmental variables. So far I love the oppurtnities of ggplot2 to visualize the data, easily choosing different periods and variables to plot and define aesthetics. But now I have encountered the first problem that I wasn´t really able to google:

I have 8 dataframes ("TreeA" - "TreeH") structured like following, where TreeA is the Name of the data frame, "Time" is the time of measurement, formatted in POSIXct format, and Tleaf, Tair and Tdiff are three of 16 variables:

 TreeA
                         Zeit  Tleaf     Tair  Tdiff ........
       1: 2018-05-18 00:00:00 12.997 13.20000 -0.203   
       2: 2018-05-18 00:10:00 13.082 13.20000 -0.119     
       3: 2018-05-18 00:20:00 11.909 12.06700 -0.158   
       4: 2018-05-18 00:30:00 11.315 11.53300 -0.219     
       5: 2018-05-18 00:40:00 11.251 11.46700 -0.216

I have melted the DFs to long format resulting

TreeA_long
                      Time variable        value
    1: 2018-05-18 00:00:00    Tleaf        12.997000000
    2: 2018-05-18 00:10:00    Tleaf        13.082000000
    3: 2018-05-18 00:20:00    Tair         11.909
    4: 2018-05-18 00:30:00    Tair         11.315
    5: 2018-05-18 00:40:00    Tdiff         1.251

From this I have been successfully plotting Graphs with this ggplot functionalities:

ggplot(subset(TreeA_long, variable %in% c("Tleaf","Tair","Tdiff")),
       aes(x=Time,
           y=value, color=variable)) +
  geom_line() +
  scale_x_datetime(limits=start.endKW21, labels = date_format("%d") , breaks = date_breaks("24 hours")) +
  scale_y_continuous(limits = c(5,55),breaks = seq(10,55, by = 2)) +
  labs(title="Mai/Juni Cbet1", x="Day", y="Temperature") +
  theme(legend.position='right') +
  scale_color_manual(values = c("Tleaf" = "green", "Tair" = "blue", "Tdiff" = "yellow"))

I have tried to add a second geom_line(data=TreeB_long) for plotting variables from the second Dataframe in the same plot. It has worked to plot all the variables from TreeB but of course I need to compare same variables and also I want to specify aesthetics (color of the lines, dashing lines etc. for each variable.

So my question is:

I hope that my questions are clear enough, and you can help me somehow. I believe that there is an easy solution to my problem, but as I said googling didn´t yield good results so far.

Thank you and have a good day! Konrad

Upvotes: 0

Views: 1151

Answers (2)

Konrad Bauer
Konrad Bauer

Reputation: 101

So according to Mikko Marttila´s proposal I was binding together all (already loaded 8 Dataframes (treeA, ..., treeF) to one using tibble::lst and dplyr::bind_rows, resulting a new DF:

Liste <- lst (treeA,treeB,treeC,treeD,treeE,treeG,treeH)
new   <- bind_rows(Liste, .id="Test")

    >         Test                Time  Tleaf     Tair   ....
    >     1: treeA 2018-05-18 00:00:00 12.997 13.20000 
    >     2: treeA 2018-05-18 00:10:00 13.082 13.20000 
    >     3: treeA 2018-05-18 00:20:00 11.909 12.06700 
.....
    >   300: treeH 2018-05-18 00:30:00 11.315 11.53300 
    >   301: treeH 2018-05-18 00:40:00 11.251 11.46700 

After this using reshape2::melt with defining two columns as id.Vars yields a long Dataframe with 4 columns

long <-melt(new, id.vars = c("Time", "Test"))

     long
                           Time  Test variable        value
         1: 2018-05-18 00:00:00 treeA    Tleaf 12.997000000
         2: 2018-05-18 00:10:00 treeA    Tleaf 13.082000000
         3: 2018-05-18 00:20:00 treeA    Tleaf 11.909000000
...
       300: 2018-05-18 00:30:00 treeH    Tleaf 11.315000000
       301: 2018-05-18 00:40:00 treeH    Tleaf 11.251000000

finally combining the Columns Zeit and Test by tidyr::unite yields a long format Dataframe including all my Data from the 8 input Dataframes:

long2 <- unite(long, variable, c(Test, variable), remove=TRUE)

long2
                       Zeit       variable        value
     1: 2018-05-18 00:00:00    treeA_Tleaf 12.997000000
     2: 2018-05-18 00:10:00    treeA_Tleaf 13.082000000
     3: 2018-05-18 00:20:00    treeA_Tleaf 11.909000000
...
   300: 2018-05-18 00:30:00    treeH_Tleaf 11.315000000
   301: 2018-05-18 00:40:00    treeH_Tleaf 11.251000000

Having this is all that I need to work with ggplot2 being able to identify and load values for plotting from the different sources. If there is easier ways to achieve this let me know in the comments. also I think there might be solutions using more functions of the base package. But since I need to get things done I don´t mind loading a lot of packages. Note that the Data pasted here is to visualize the structure.

Upvotes: 0

Mikko Marttila
Mikko Marttila

Reputation: 11878

I think you should probably append the treeA-treeH datasets, including an indicator variable for the name of the data (e.g. dplyr::bind_rows(tibble::lst(treeA, treeB, <...>, treeH), .id = "data")), then melt() and use the dataset indicator variable to construct your plot.

Here's a simplified example. First, let's read in the data that you give:

txt <- "Date Time  Tleaf     Tair  Tdiff
2018-05-18 00:00:00 12.997 13.20000 -0.203
2018-05-18 00:10:00 13.082 13.20000 -0.119
2018-05-18 00:20:00 11.909 12.06700 -0.158
2018-05-18 00:30:00 11.315 11.53300 -0.219
2018-05-18 00:40:00 11.251 11.46700 -0.216"

treeA <- read.table(text = txt, header = TRUE,
                    stringsAsFactors = FALSE)

For the sake of the example, I'm also creating a treeB dataset by just adding some noise to treeA:

library(dplyr)
library(ggplot2)

set.seed(1)
n <- nrow(treeA)

treeB <- treeA %>%
  mutate_if(is.numeric, function(x) x + rnorm(n))

We can now append the two datasets with bind_rows() and add a variable to show the original data frame.

tree <- tibble::lst(treeA, treeB) %>%
  bind_rows(.id = "data") %>%
  mutate(dttm = as.POSIXct(paste(Date, Time)))

Before plotting, it's useful to reshape the data to long form, as you have done before:

tree_long <- reshape2::melt(tree, measure = c("Tleaf", "Tair", "Tdiff"))

Now we are ready to plot. The choice of the layout you want to use will of course depend on what aspect of the data you want to emphasize; for example, if the comparison between different tree datasets is of interest, it might be a good idea to use facetting to compare the trees within each variable:

ggplot(tree_long, aes(dttm, value, color = data)) +
  facet_wrap(~ variable, scales = "free_y", ncol = 1) +
  geom_line()

Created on 2018-07-09 by the reprex package (v0.2.0.9000).

Upvotes: 0

Related Questions