Forsaken_PhD
Forsaken_PhD

Reputation: 1

How do I create a line graph using multiple variables when the multiple variables are all in the same column?

WorkingData

structure(list(Sample.Id = c(NA, "2", "2", "2", "2", "2", "2", 

"2", "2", "2", "2", "3", "3", "3", "3", "3", "3", "3", "3", "3" ), Sampling..Date = c(NA, "08-Sep-14", "14-Oct-14", "02-Nov-14", "21-Nov-14", "03-Dec-14", "15-Dec-14", "11-Jan-15", "08-Feb-15", "01-Mar-15", "06-Apr-15", "03-Sep-14", "08-Sep-14", "14-Oct-14", "02-Nov-14", "21-Nov-14", "03-Dec-14", "15-Dec-14", "11-Jan-15", "26-Jan-15"), Tot.P = c("µg/ml", "0.002", "0.017", "0.035", "0.04", "0.059", "0.155", "0.021", "0.022", "0.025", "<0.009", "0.021", "0.003", "0.036", "0.141", "0.041", "0.044", "0.01", "0.023", "0.016"), DOC = c("µg/ml", NA, "12.3", "13.4", "12.5", "9.9", "14.7", "8.8", "8.3", "0.026", "7.5", "13.4", NA, "14.6", "16.6", "14.7", "12.6", "12.6", "10.6", "11.4"), Tot.N = c("µg/ml", NA, "3.63", "4.12", "3.98", "4.08", "3.38", "3.63", "4.88", "8.3", "2.74", "2.48", NA, "3.07", "3.38", "3.3", "3.43", "2.19", "2.77", "4.25"), DOC.1 = c("µg/ml", "13.6", NA, NA, NA, NA, NA, NA, NA, NA, NA, "14.44", "16.85", NA, NA, NA, NA, NA, NA, NA), Tot.P.1 = c("µg/ml", "0.053", NA, NA, NA, NA, NA, NA, NA, NA, NA, "0.08", "0.071", NA, NA, NA, NA, NA, NA, NA), Total.N = c("µg/ml", "3.363", NA, NA, NA, NA, NA, NA, NA, NA, NA, "2.645", "2.637", NA, NA, NA, NA, NA, NA, NA)), row.names = c(NA, 20L), class = "data.frame"

I have a set of water quality data from 2014-2022 over different sites and different time periods. Each site has a different monitoring period and the data was analysed using two different devices of which there are only two periods of overlap where the samples were analysed using both machines. I am trying to plot a time series showing the P, N and DOC across each site over time and shade in the areas where one machine was used instead of another. This is all a bit complicated and I am so new to R so have been running in circles for a week. My problem is I am unsure how to select the section of a column I need to create the variable I want so it makes sense.

I have tried to look it up on blogs but can't seem to mash the different pieces of advice together to make it work. Any tips would be much appreciated. Here is the data that I'm on about.

Upvotes: 0

Views: 67

Answers (1)

Godrim
Godrim

Reputation: 543

You will definitely need to clean up your data to fit this solution, but your basic way about this is pivoting from wide to long form.

Then you need to ensure that your dates are the propper POSIXct format.

Then it is just a matter of grouping by your relevant variables and plotting with geom_line()

I added the facet_grid to separate by Sample.Id.

library(tidyverse)
#> Warning: pakke 'ggplot2' blev bygget under R version 4.2.2
#> Warning: pakke 'tidyr' blev bygget under R version 4.2.2
#> Warning: pakke 'purrr' blev bygget under R version 4.2.2
#> Warning: pakke 'dplyr' blev bygget under R version 4.2.2
#> Warning: pakke 'stringr' blev bygget under R version 4.2.2
#> Warning: pakke 'forcats' blev bygget under R version 4.2.2

df <- structure(list(Sample.Id = c("2", "2", "2", "2", "2", "2", "2", 
                                    "2", "2", "2", "3", "3", "3", "3", "3", "3", "3", "3", "3"), 
                      Sampling..Date = c("08-Sep-14", "14-Oct-14", "02-Nov-14", 
                                         "21-Nov-14", "03-Dec-14", "15-Dec-14", "11-Jan-15", "08-Feb-15", 
                                         "01-Mar-15", "06-Apr-15", "03-Sep-14", "08-Sep-14", "14-Oct-14", 
                                         "02-Nov-14", "21-Nov-14", "03-Dec-14", "15-Dec-14", "11-Jan-15", 
                                         "26-Jan-15"), Tot.P = c("0.002", "0.017", "0.035", "0.04", 
                                                                 "0.059", "0.155", "0.021", "0.022", "0.025", "<0.009", "0.021", 
                                                                 "0.003", "0.036", "0.141", "0.041", "0.044", "0.01", "0.023", 
                                                                 "0.016"), DOC = c(NA, "12.3", "13.4", "12.5", "9.9", "14.7", 
                                                                                   "8.8", "8.3", "0.026", "7.5", "13.4", NA, "14.6", "16.6", 
                                                                                   "14.7", "12.6", "12.6", "10.6", "11.4"), Tot.N = c(NA, "3.63", 
                                                                                                                                      "4.12", "3.98", "4.08", "3.38", "3.63", "4.88", "8.3", "2.74", 
                                                                                                                                      "2.48", NA, "3.07", "3.38", "3.3", "3.43", "2.19", "2.77", 
                                                                                                                                      "4.25"), DOC.1 = c("13.6", NA, NA, NA, NA, NA, NA, NA, NA, 
                                                                                                                                                         NA, "14.44", "16.85", NA, NA, NA, NA, NA, NA, NA)), row.names = 2:20, class = "data.frame")


df |> 
  mutate(Tot.P = str_replace(Tot.P, "<", ""),
         across(Tot.P:DOC.1, as.numeric),
         Sampling..Date = as.POSIXct(Sampling..Date, format = "%d-%b-%y")) |> 
  select(-c(DOC.1)) |> 
  pivot_longer(cols = c(Tot.P, DOC, Tot.N)) |> 
  ggplot(aes(x = Sampling..Date, y = value, group = name, col = name)) + 
  geom_line() + 
  facet_grid(~Sample.Id)
#> Warning: Removed 5 rows containing missing values (`geom_line()`).

Created on 2023-02-14 with reprex v2.0.2

Upvotes: 0

Related Questions