Reputation: 523

how to automate combining multiple shapefiles and retain filename as column?

I have multiple point shapefiles, each containing two columns: "id" and "Measurement." The "id" column represents the unique identifier for each point, and these identifiers are consistent across all the individual shapefiles, meaning they are the exact points across multiple files.

My objective is to merge these shapefiles into a single file. To prevent any confusion, I would like to rename the "Measurement" column with the original filename of each shapefile. This way, I will be able to identify the source of the measurement accurately.

Shapefile_1.shp

id	Measurement
1	10.5
2	12.3
3	9.8

Shapefile_2.shp

id	Measurement
1	8.2
2	11.7
3	10.1

Desired Output

id	Shapefile_1	Shapefile_2
1	10.5	8.2
2	12.3	11.7
3	9.8	10.1

thanks!

Upvotes: 0

Answers (2)

margusl

Reputation: 17389

With sf objects in a named list, we could just use purrr::list_rbind() to bind rows and have a name of each sf object in an additional column. And if wide format is indeed required, we can achieve it with tidyr::pivot_wider().

First, let's prepare a reprex:

library(sf)
library(dplyr)
library(tidyr)
library(purrr)

# generate some example datasets based on nc.shp, 
# each dataset with a different mean value
point_meas_dataset <- function(points, mean, sd){
  st_sf(id = seq_along(points),
        measurement = rnorm(length(points), mean, sd),
        geomerty = points)
}
nc_c <- st_read(system.file("shape/nc.shp", package="sf")) %>% 
  st_centroid() %>% 
  st_geometry()

set.seed(123)
shapefile_1 <- point_meas_dataset(nc_c, 10, 1)
shapefile_2 <- point_meas_dataset(nc_c, 12, 1)
shapefile_3 <- point_meas_dataset(nc_c, 14, 1)

# keep sf objects in a named list, i.e use list.files() + lapply() + st_read()
# to import list of shapefiles
meas_shapes <- list("shapefile_1.shp" = shapefile_1,
                    "shapefile_2.shp" = shapefile_2,
                    "shapefile_3.shp" = shapefile_3)

Combine sf objects stored in a meas_shapes while keeping the shapefile name as additional identification:

# row-bind all list elements, to form a long table, store names in "shapefile" column
meas_long <- list_rbind(meas_shapes, names_to ="shapefile") %>% 
  st_as_sf()
meas_long
#> Simple feature collection with 300 features and 3 fields
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: -84.05986 ymin: 34.07671 xmax: -75.8095 ymax: 36.49111
#> Geodetic CRS:  NAD27
#> First 10 features:
#>          shapefile id measurement                   geomerty
#> 1  shapefile_1.shp  1    9.439524  POINT (-81.49823 36.4314)
#> 2  shapefile_1.shp  2    9.769823 POINT (-81.12513 36.49111)
#> 3  shapefile_1.shp  3   11.558708 POINT (-80.68573 36.41252)
#> 4  shapefile_1.shp  4   10.070508 POINT (-76.02719 36.40714)
#> 5  shapefile_1.shp  5   10.129288 POINT (-77.41046 36.42236)
#> 6  shapefile_1.shp  6   11.715065 POINT (-76.99472 36.36142)
#> 7  shapefile_1.shp  7   10.460916 POINT (-76.23402 36.40122)
#> 8  shapefile_1.shp  8    8.734939 POINT (-76.70446 36.44428)
#> 9  shapefile_1.shp  9    9.313147 POINT (-78.11042 36.39693)
#> 10 shapefile_1.shp 10    9.554338 POINT (-80.23429 36.40042)

# with long (tidy) format, the actual number of input shapefiles does not matter and it's 
# easy to aggregate over each feature (i.e. name of the shapefile);
# or use grouping & faceting by shapefile name when using ggplot
meas_long %>% 
  st_drop_geometry() %>% 
  group_by(shapefile) %>% 
  summarise(meas_mean = mean(measurement),
            meas_sd = sd(measurement))
#> # A tibble: 3 × 3
#>   shapefile       meas_mean meas_sd
#>   <chr>               <dbl>   <dbl>
#> 1 shapefile_1.shp      10.1   0.913
#> 2 shapefile_2.shp      11.9   0.967
#> 3 shapefile_3.shp      14.1   0.950

# but we can pivot to wide as well; and perhaps drop geometry column 
# if dataframe / tibble is desired and actual coordinates are not needed anymore:
meas_long %>%
  st_drop_geometry() %>% 
  pivot_wider(names_from = "shapefile", values_from = "measurement")
#> # A tibble: 100 × 4
#>       id shapefile_1.shp shapefile_2.shp shapefile_3.shp
#>    <int>           <dbl>           <dbl>           <dbl>
#>  1     1            9.44            11.3            16.2
#>  2     2            9.77            12.3            15.3
#>  3     3           11.6             11.8            13.7
#>  4     4           10.1             11.7            14.5
#>  5     5           10.1             11.0            13.6
#>  6     6           11.7             12.0            13.5
#>  7     7           10.5             11.2            13.2
#>  8     8            8.73            10.3            13.4
#>  9     9            9.31            11.6            15.7
#> 10    10            9.55            12.9            13.9
#> # ℹ 90 more rows

^{Created on 2023-06-24 with reprex v2.0.2}

Upvotes: 0

Peter

Reputation: 12699

With {dplyr} you could do this:

library(dplyr)


Shapefile_1.shp <- data.frame(id = 1:3,
                  Measurement = c(10.5, 12.3, 9.8)) 

Shapefile_2.shp <- data.frame(id = 1:3,
                              Measurement = c(8.2, 11.7, 10.1 )) 

left_join(Shapefile_1.shp, Shapefile_2.shp, by = "id") |> 
  rename(Shapefile_1 = Measurement.x, Shapefile_2 = Measurement.y)
#>   id Shapefile_1 Shapefile_2
#> 1  1        10.5         8.2
#> 2  2        12.3        11.7
#> 3  3         9.8        10.1

^{Created on 2023-06-24 with reprex v2.0.2}

Upvotes: 0

how to automate combining multiple shapefiles and retain filename as column?

Answers (2)

Related Questions