Mah
Mah

Reputation: 91

R: Plotting multiple csv files independently in one time series plot (also importing csv files with conditions)

I am trying to end up with a time series plot comparing different cities with a center data (data frame).Where center is a dataframe object in R studio, I already imported.

I have a folder with 165 csv files, each representing a city. I want to plot all the 165 csv files (as independent names/data frame) in one plot plus the center data frame.

I want it to look something like this: (with the x-axis being time and the y-axis being CO with all being solid colors.

enter image description here

There are four things I want to do to each csv file, but in the end, have it automated that these four actions are done to each of the 165 csv files.

1) Skip the first 25 row of the csv file

2) Combine the Date and Time column for each csv file

3) Remove the rows where the values in the cells in column 3 is empty

4) Change the name of column 3 from ug/m3 to CO

I want it to perform the four actions on each of the 165 csv files in an automated way.Then, be able to efficiently plot the newly updated csv files in one plot.

I used the code below on one csv file to see if it would work on one csv. I am not sure how to combine everything in an efficient manner.achieve this:

city1 <- read.csv("path",
                        skip = 25)

city1$rtime <- strptime(paste(city1$Date, city1$Time), "%m/%d/%Y %H:%M")

colnames(city1)[3] <- "CO"

city[,3][!(is.na(city[,3]))] ## side note: help with this would be appreciated, I was unsure of what goes before the comma.

Overall, I want a plot like above with all the 165 cities (csv files). I need help placing the four actions on each csv file and plot them all in one plot.

For the plot, I did this as an example:

ggplot(center, aes(rtime, CO)) + geom_smooth(aes(color="Center"))+
  geom_smooth(data=city1,aes(color="City1"))+
  labs(color="Legend")

UPDATE: The csv file of each city seemed to have combined to create one line.I am not sure if I can post the exact output but it looked like the one below: with the pink line being cities and blue being center.x-axis time and y-axis being CO.I hope this helps.

Result of unique(df.cleaned$cities)

> unique(df.cleaned$cities)
 [1] "WFH4N_YEK04_PORTLAND_08AUG16_R1"
 [2] "WFH2N_QIM23_AUSTIN_30JUL16_R1" 
 [3] "WFH7N_QIM70_NEWYORK_20JUL16_R1"
 [4] "WFH3N_YEK28_NAMPA_23AUG16_R1"
 [5] "WFH9N_YEK18_MESA_12JUL16_R1"
 [6] "WFH6N_QIM10_OAKLAND_11AUG16_R1"
 [7] "WFH3N_YEK01_DETROIT_30AUG16_R1"
 [8] "WFH6N_YEK05_ATLANTA_30AUG16_R1"
 [9] "WFH1N_YEK32_LONGBEACH_01JUL16_R1"
[10] "WFH8N_YEK39_LOSANGELES_30AUG16_R1"
[11] "WFH5N_YEK59_BALTIMORE_31AUG16_R1"
[12] "WFH1N_QIM19_MEMPHIS_01JUL16_R1"
[13] "WFH0N_YEK2087_DENVER_09JUL16_R1"
[14] "WFH4N_QIM43_CLEVELAND_30AUG16_R1"
[15] "WFH8N_QIM65_HARTFORD_30AUG16_R1"
[16] "WFH2N_YEK66_SEATTLE_30AUG16_R1"
[17] "WFH0N_YEK17_SANJOSE_30AUG16_R1"

enter image description here

Upvotes: 2

Views: 1591

Answers (1)

Brian
Brian

Reputation: 8275

This is a fully tidyverse solution, so base R traditionalists, look away now.

library(dplyr)
library(purrr)
library(tidyr)
library(readr)
library(stringr)
library(lubridate)
# or just library(tidyverse)

df <-
  data_frame(files = list.files(path = "./yourfilepath",    
                                pattern = "csv", 
                                full.names = T)) %>%
    mutate(dfs = map(files, read_csv, skip = 25)) %>%       
    unnest() %>% 
    mutate(cities = str_replace_all(files, "./yourfilepath/", ""),
           cities = str_replace_all(cities, ".csv", ""))
  • This chunk first makes a list of all the files in the folder yourfilepath named *.csv.
  • Then you use purrr::map to run the same function on each element of the files variable: read_csv, skipping the first 25 lines.
  • Now you have a dataframe with 165 nested dataframes in the column dfs.
  • unnest() expands these to full dataframes, repeating the files rows as needed.
  • Then you can strip out the extra parts of the path from files to get your city names, assuming that the filename of the .CSV is the name of the city.
    • If you have the name of the city already in a column inside each .CSV, then you don't need the mutate(cities = ...) lines.

Now it's easy peasy to do the rest:

df.cleaned <- df %>%
  rename(CO = `ug/m3`) %>%
  filter( !is.na(CO)) %>%
  mutate(rtime = paste(Date, Time),
         rtime = mdy_hm(rtime))

And now use your plotting code, slightly updated:

ggplot(center, aes(rtime, CO)) + 
  geom_smooth(aes(color="Center"))+
  geom_smooth(data = df.cleaned, aes(color = cities))+
  labs(color="Legend")

Update with more regex

df.cleaned <- df.cleaned %>%
  group_by(files) %>%
  mutate(cities = str_c(str_extract_all(cities, 
                                  "[A-Z][a-z]+", 
                                  simplify = T), 
                        collapse = " "))

Upvotes: 1

Related Questions