Reputation: 91
I am trying to end up with a time series plot comparing different cities with a center
data (data frame).Where center
is a dataframe object in R studio, I already imported.
I have a folder with 165 csv files, each representing a city. I want to plot all the 165 csv files (as independent names/data frame) in one plot plus the center
data frame.
I want it to look something like this: (with the x-axis being time and the y-axis being CO with all being solid colors.
There are four things I want to do to each csv file, but in the end, have it automated that these four actions are done to each of the 165 csv files.
1) Skip the first 25 row of the csv file
2) Combine the Date and Time column for each csv file
3) Remove the rows where the values in the cells in column 3 is empty
4) Change the name of column 3 from ug/m3 to CO
I want it to perform the four actions on each of the 165 csv files in an automated way.Then, be able to efficiently plot the newly updated csv files in one plot.
I used the code below on one csv file to see if it would work on one csv. I am not sure how to combine everything in an efficient manner.achieve this:
city1 <- read.csv("path",
skip = 25)
city1$rtime <- strptime(paste(city1$Date, city1$Time), "%m/%d/%Y %H:%M")
colnames(city1)[3] <- "CO"
city[,3][!(is.na(city[,3]))] ## side note: help with this would be appreciated, I was unsure of what goes before the comma.
Overall, I want a plot like above with all the 165 cities (csv files). I need help placing the four actions on each csv file and plot them all in one plot.
For the plot, I did this as an example:
ggplot(center, aes(rtime, CO)) + geom_smooth(aes(color="Center"))+
geom_smooth(data=city1,aes(color="City1"))+
labs(color="Legend")
UPDATE: The csv file of each city seemed to have combined to create one line.I am not sure if I can post the exact output but it looked like the one below: with the pink line being cities and blue being center.x-axis time and y-axis being CO.I hope this helps.
Result of unique(df.cleaned$cities)
> unique(df.cleaned$cities)
[1] "WFH4N_YEK04_PORTLAND_08AUG16_R1"
[2] "WFH2N_QIM23_AUSTIN_30JUL16_R1"
[3] "WFH7N_QIM70_NEWYORK_20JUL16_R1"
[4] "WFH3N_YEK28_NAMPA_23AUG16_R1"
[5] "WFH9N_YEK18_MESA_12JUL16_R1"
[6] "WFH6N_QIM10_OAKLAND_11AUG16_R1"
[7] "WFH3N_YEK01_DETROIT_30AUG16_R1"
[8] "WFH6N_YEK05_ATLANTA_30AUG16_R1"
[9] "WFH1N_YEK32_LONGBEACH_01JUL16_R1"
[10] "WFH8N_YEK39_LOSANGELES_30AUG16_R1"
[11] "WFH5N_YEK59_BALTIMORE_31AUG16_R1"
[12] "WFH1N_QIM19_MEMPHIS_01JUL16_R1"
[13] "WFH0N_YEK2087_DENVER_09JUL16_R1"
[14] "WFH4N_QIM43_CLEVELAND_30AUG16_R1"
[15] "WFH8N_QIM65_HARTFORD_30AUG16_R1"
[16] "WFH2N_YEK66_SEATTLE_30AUG16_R1"
[17] "WFH0N_YEK17_SANJOSE_30AUG16_R1"
Upvotes: 2
Views: 1591
Reputation: 8275
This is a fully tidyverse
solution, so base R traditionalists, look away now.
library(dplyr)
library(purrr)
library(tidyr)
library(readr)
library(stringr)
library(lubridate)
# or just library(tidyverse)
df <-
data_frame(files = list.files(path = "./yourfilepath",
pattern = "csv",
full.names = T)) %>%
mutate(dfs = map(files, read_csv, skip = 25)) %>%
unnest() %>%
mutate(cities = str_replace_all(files, "./yourfilepath/", ""),
cities = str_replace_all(cities, ".csv", ""))
yourfilepath
named *.csv. purrr::map
to run the
same function on each element of the files
variable: read_csv
,
skipping the first 25 lines. dfs
. unnest()
expands these to full
dataframes, repeating the files
rows as needed. files
to get your city names,
assuming that the filename of the .CSV is the name of the city.
mutate(cities = ...)
lines.Now it's easy peasy to do the rest:
df.cleaned <- df %>%
rename(CO = `ug/m3`) %>%
filter( !is.na(CO)) %>%
mutate(rtime = paste(Date, Time),
rtime = mdy_hm(rtime))
And now use your plotting code, slightly updated:
ggplot(center, aes(rtime, CO)) +
geom_smooth(aes(color="Center"))+
geom_smooth(data = df.cleaned, aes(color = cities))+
labs(color="Legend")
df.cleaned <- df.cleaned %>%
group_by(files) %>%
mutate(cities = str_c(str_extract_all(cities,
"[A-Z][a-z]+",
simplify = T),
collapse = " "))
Upvotes: 1