Reputation: 77
I have imported a set of .csv files from a directory into a list file like below.
features <- c("ASB_aggregated_by_lsoa_2015.csv", "ASB_aggregated_by_lsoa_2019.csv", "ASB_aggregated_by_lsoa_2022.csv", "Bicycle_theft_aggregated_by_lsoa_2015.csv", "Bicycle_theft_aggregated_by_lsoa_2019.csv", "Bicycle_theft_aggregated_by_lsoa_2022.csv", "Burglary_aggregated_by_lsoa_2015.csv", "Burglary_aggregated_by_lsoa_2019.csv", "Burglary_aggregated_by_lsoa_2022.csv", "Criminal_damage_and_arson_aggregated_by_lsoa_2015.csv", "Criminal_damage_and_arson_aggregated_by_lsoa_2019.csv", "Criminal_damage_and_arson_aggregated_by_lsoa_2022.csv", "Drugs_aggregated_by_lsoa_2015.csv", "Drugs_aggregated_by_lsoa_2019.csv", "Drugs_aggregated_by_lsoa_2022.csv", "Other_crime_aggregated_by_lsoa_2015.csv", "Other_crime_aggregated_by_lsoa_2019.csv", "Other_crime_aggregated_by_lsoa_2022.csv", "Other_theft_aggregated_by_lsoa_2015.csv", "Other_theft_aggregated_by_lsoa_2019.csv", "Other_theft_aggregated_by_lsoa_2022.csv")
The files for each year have a similar column, but the number of cases in each file are different. First, I would like to combine all the files for similar years (2015; 2019 and 2022) into the same dataframe and do a full join based on their similar column. At the end, I would have 3 dataframes for each year.
Thank you
Upvotes: 0
Views: 46
Reputation: 270428
First set up some test files and create features
. Now read the files into DFs
and create a years
vector that gives the corresponding year for each entry in DFs
splitting DFs
by years
giving spl
. Then for each component of the split perform the merge.
# set up test files
for(year in c(2015, 2019, 2022))
for(let in letters[1:3])
write.csv(BOD + year + match(let, letters), paste0("BOD-", year, let, ".csv"),
row.names = FALSE, quote = FALSE)
features <- dir()
# we now have features and the test files so process them
DFs <- Map(read.csv, features)
years <- gsub("\\D", "", features)
spl <- split(DFs, years)
L <- lapply(spl, \(x) Reduce(function(x, y) merge(x, y, by="Time", all=TRUE), x))
str(L) # show structure of result
giving:
List of 3
$ 2015:'data.frame': 9 obs. of 4 variables:
..$ Time : int [1:9] 2017 2018 2019 2020 2021 2022 2023 2024 2025
..$ demand.x: num [1:9] 2024 2026 2035 2032 2032 ...
..$ demand.y: num [1:9] NA 2025 2027 2036 2033 ...
..$ demand : num [1:9] NA NA 2026 2028 2037 ...
$ 2019:'data.frame': 9 obs. of 4 variables:
..$ Time : int [1:9] 2021 2022 2023 2024 2025 2026 2027 2028 2029
..$ demand.x: num [1:9] 2028 2030 2039 2036 2036 ...
..$ demand.y: num [1:9] NA 2029 2031 2040 2037 ...
..$ demand : num [1:9] NA NA 2030 2032 2041 ...
$ 2022:'data.frame': 9 obs. of 4 variables:
..$ Time : int [1:9] 2024 2025 2026 2027 2028 2029 2030 2031 2032
..$ demand.x: num [1:9] 2031 2033 2042 2039 2039 ...
..$ demand.y: num [1:9] NA 2032 2034 2043 2040 ...
..$ demand : num [1:9] NA NA 2033 2035 2044 .
Upvotes: 0
Reputation: 166
You can do what you would want to do by creating a data frame with the filenames, extracting the year (e.g mutate(year=str_extract(filename,"20[0-9]+"))
) and then proceeding with group_by(year)
and map_dfr()
to carry out the other operations on a grouped dataframe. This would give you a single data frame with years, you can subset that manually.
Without the files or knowing at least their structure, it is difficult to go into more details. Also, it seems like you have "Census_combined_data"
for years 2011 and 2021, but no other data for those years, that would of course break the workflow since you are missing files and have files with a different structure.
Upvotes: 0