Reputation: 857
So I'm working in a script and doing multiple tasks with the same sets of data. But because each task takes a few hundred lines of code, I end up clearing my global environment so I can move onto the next task. Then I have to rerun the lines of code at the top of the script to import my data again and work on my next task. I want to just type a command that will automatically reimport the data once I'm done with one task and can work on the other.
Here is essentially what I run every time I need to work on the next task. I import my data with the read.csv function and then filter by certain rows I need.
d2015 <- read_csv("Data 2015 CSV.csv")
d2016 <- read_csv("Data 2016 CSV.csv")
d2017 <- read_csv("Data 2017 CSV.csv")
d2018 <- read_csv("Data 2018 CSV.csv")
dta_15 <- d2015 %>% filter(`Number` %in% c("TX-500", "TX-600", "TX-503", "TX-700", "TX-603",
"AZ-502", "MI-501", "LA-503", "GA-500", "FL-510"))
dta_16 <- d2016 %>% filter(`Number` %in% c("TX-500", "TX-600", "TX-503", "TX-700", "TX-603",
"AZ-502", "MI-501", "LA-503", "GA-500", "FL-510"))
dta_17 <- d2017 %>% filter(`Number` %in% c("TX-500", "TX-600", "TX-503", "TX-700", "TX-603",
"AZ-502", "MI-501", "LA-503", "GA-500", "FL-510"))
dta_18 <- d2018 %>% filter(`Number` %in% c("TX-500", "TX-600", "TX-503", "TX-700", "TX-603",
"AZ-502", "MI-501", "LA-503", "GA-500", "FL-510"))
I tried putting it all in a loop but that didn't work,
rundata <- {
d2015 <- read_csv("Data 2015 CSV.csv")
d2016 <- read_csv("Data 2016 CSV.csv")
d2017 <- read_csv("Data 2017 CSV.csv")
d2018 <- read_csv("Data 2018 CSV.csv")
dta_15 <- d2015 %>% filter(`Number` %in% c("TX-500", "TX-600", "TX-503", "TX-700", "TX-603",
"AZ-502", "MI-501", "LA-503", "GA-500", "FL-510"))
dta_16 <- d2016 %>% filter(`Number` %in% c("TX-500", "TX-600", "TX-503", "TX-700", "TX-603",
"AZ-502", "MI-501", "LA-503", "GA-500", "FL-510"))
dta_17 <- d2017 %>% filter(`Number` %in% c("TX-500", "TX-600", "TX-503", "TX-700", "TX-603",
"AZ-502", "MI-501", "LA-503", "GA-500", "FL-510"))
dta_18 <- d2018 %>% filter(`Number` %in% c("TX-500", "TX-600", "TX-503", "TX-700", "TX-603",
"AZ-502", "MI-501", "LA-503", "GA-500", "FL-510"))
}
How can i create one command that will rerun all these commands? Best.
Upvotes: 0
Views: 96
Reputation: 887008
We can do this in map
library(dplyr)
library(purrr)
library(readr)
As the values to filter are the same across all the datasets, we can create an object
nm1 <- c("TX-500", "TX-600", "TX-503", "TX-700", "TX-603",
"AZ-502", "MI-501", "LA-503", "GA-500", "FL-510")
Get the files
that follow the specific pattern in its names
files <- list.files(pattern = '^Data \\d{4} CSV\\.csv$")
Loop over the files, read with read_csv
from readr
and filter
the elements to create a list
of subset of data.frame/tibble. It is better to keep it in a list
rather than individual objects in the global env
lst1 <- map(files, ~ read_csv(.x) %>%
filter(Number %in% nm1)
)
Upvotes: 2