Reading multiple csv files that contains a particular text from a directory into r

Question

I have imported a set of .csv files from a directory into a list file like below.

features <- c("ASB_aggregated_by_lsoa_2015.csv", "ASB_aggregated_by_lsoa_2019.csv", "ASB_aggregated_by_lsoa_2022.csv", "Bicycle_theft_aggregated_by_lsoa_2015.csv", "Bicycle_theft_aggregated_by_lsoa_2019.csv", "Bicycle_theft_aggregated_by_lsoa_2022.csv", "Burglary_aggregated_by_lsoa_2015.csv", "Burglary_aggregated_by_lsoa_2019.csv", "Burglary_aggregated_by_lsoa_2022.csv", "Criminal_damage_and_arson_aggregated_by_lsoa_2015.csv", "Criminal_damage_and_arson_aggregated_by_lsoa_2019.csv", "Criminal_damage_and_arson_aggregated_by_lsoa_2022.csv", "Drugs_aggregated_by_lsoa_2015.csv", "Drugs_aggregated_by_lsoa_2019.csv", "Drugs_aggregated_by_lsoa_2022.csv", "Other_crime_aggregated_by_lsoa_2015.csv", "Other_crime_aggregated_by_lsoa_2019.csv", "Other_crime_aggregated_by_lsoa_2022.csv", "Other_theft_aggregated_by_lsoa_2015.csv", "Other_theft_aggregated_by_lsoa_2019.csv", "Other_theft_aggregated_by_lsoa_2022.csv")

The files for each year have a similar column, but the number of cases in each file are different. First, I would like to combine all the files for similar years (2015; 2019 and 2022) into the same dataframe and do a full join based on their similar column. At the end, I would have 3 dataframes for each year.

Thank you

G. Grothendieck · Accepted Answer

First set up some test files and create features. Now read the files into DFs and create a years vector that gives the corresponding year for each entry in DFs splitting DFs by years giving spl. Then for each component of the split perform the merge.

# set up test files
for(year in c(2015, 2019, 2022))
  for(let in letters[1:3])
    write.csv(BOD + year + match(let, letters), paste0("BOD-", year, let, ".csv"),
      row.names = FALSE, quote = FALSE)
features <- dir()

# we now have features and the test files so process them
DFs <- Map(read.csv, features)
years <- gsub("\D", "", features)
spl <- split(DFs, years)
L <- lapply(spl, \(x) Reduce(function(x, y) merge(x, y, by="Time", all=TRUE), x))

str(L) # show structure of result

giving:

List of 3
 $ 2015:'data.frame':   9 obs. of  4 variables:
  ..$ Time    : int [1:9] 2017 2018 2019 2020 2021 2022 2023 2024 2025
  ..$ demand.x: num [1:9] 2024 2026 2035 2032 2032 ...
  ..$ demand.y: num [1:9] NA 2025 2027 2036 2033 ...
  ..$ demand  : num [1:9] NA NA 2026 2028 2037 ...
 $ 2019:'data.frame':   9 obs. of  4 variables:
  ..$ Time    : int [1:9] 2021 2022 2023 2024 2025 2026 2027 2028 2029
  ..$ demand.x: num [1:9] 2028 2030 2039 2036 2036 ...
  ..$ demand.y: num [1:9] NA 2029 2031 2040 2037 ...
  ..$ demand  : num [1:9] NA NA 2030 2032 2041 ...
 $ 2022:'data.frame':   9 obs. of  4 variables:
  ..$ Time    : int [1:9] 2024 2025 2026 2027 2028 2029 2030 2031 2032
  ..$ demand.x: num [1:9] 2031 2033 2042 2039 2039 ...
  ..$ demand.y: num [1:9] NA 2032 2034 2043 2040 ...
  ..$ demand  : num [1:9] NA NA 2033 2035 2044 .

Reading multiple csv files that contains a particular text from a directory into r

Answers (2)

Related Questions