Reputation: 523
I have multiple point shapefiles, each containing two columns: "id" and "Measurement." The "id" column represents the unique identifier for each point, and these identifiers are consistent across all the individual shapefiles, meaning they are the exact points across multiple files.
My objective is to merge these shapefiles into a single file. To prevent any confusion, I would like to rename the "Measurement" column with the original filename of each shapefile. This way, I will be able to identify the source of the measurement accurately.
Shapefile_1.shp
id | Measurement |
---|---|
1 | 10.5 |
2 | 12.3 |
3 | 9.8 |
Shapefile_2.shp
id | Measurement |
---|---|
1 | 8.2 |
2 | 11.7 |
3 | 10.1 |
Desired Output
id | Shapefile_1 | Shapefile_2 |
---|---|---|
1 | 10.5 | 8.2 |
2 | 12.3 | 11.7 |
3 | 9.8 | 10.1 |
thanks!
Upvotes: 0
Views: 276
Reputation: 17389
With sf
objects in a named list, we could just use purrr::list_rbind()
to bind rows and have a name of each sf
object in an additional column. And if wide format is indeed required, we can achieve it with tidyr::pivot_wider()
.
First, let's prepare a reprex:
library(sf)
library(dplyr)
library(tidyr)
library(purrr)
# generate some example datasets based on nc.shp,
# each dataset with a different mean value
point_meas_dataset <- function(points, mean, sd){
st_sf(id = seq_along(points),
measurement = rnorm(length(points), mean, sd),
geomerty = points)
}
nc_c <- st_read(system.file("shape/nc.shp", package="sf")) %>%
st_centroid() %>%
st_geometry()
set.seed(123)
shapefile_1 <- point_meas_dataset(nc_c, 10, 1)
shapefile_2 <- point_meas_dataset(nc_c, 12, 1)
shapefile_3 <- point_meas_dataset(nc_c, 14, 1)
# keep sf objects in a named list, i.e use list.files() + lapply() + st_read()
# to import list of shapefiles
meas_shapes <- list("shapefile_1.shp" = shapefile_1,
"shapefile_2.shp" = shapefile_2,
"shapefile_3.shp" = shapefile_3)
Combine sf
objects stored in a meas_shapes
while keeping the shapefile name as additional identification:
# row-bind all list elements, to form a long table, store names in "shapefile" column
meas_long <- list_rbind(meas_shapes, names_to ="shapefile") %>%
st_as_sf()
meas_long
#> Simple feature collection with 300 features and 3 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: -84.05986 ymin: 34.07671 xmax: -75.8095 ymax: 36.49111
#> Geodetic CRS: NAD27
#> First 10 features:
#> shapefile id measurement geomerty
#> 1 shapefile_1.shp 1 9.439524 POINT (-81.49823 36.4314)
#> 2 shapefile_1.shp 2 9.769823 POINT (-81.12513 36.49111)
#> 3 shapefile_1.shp 3 11.558708 POINT (-80.68573 36.41252)
#> 4 shapefile_1.shp 4 10.070508 POINT (-76.02719 36.40714)
#> 5 shapefile_1.shp 5 10.129288 POINT (-77.41046 36.42236)
#> 6 shapefile_1.shp 6 11.715065 POINT (-76.99472 36.36142)
#> 7 shapefile_1.shp 7 10.460916 POINT (-76.23402 36.40122)
#> 8 shapefile_1.shp 8 8.734939 POINT (-76.70446 36.44428)
#> 9 shapefile_1.shp 9 9.313147 POINT (-78.11042 36.39693)
#> 10 shapefile_1.shp 10 9.554338 POINT (-80.23429 36.40042)
# with long (tidy) format, the actual number of input shapefiles does not matter and it's
# easy to aggregate over each feature (i.e. name of the shapefile);
# or use grouping & faceting by shapefile name when using ggplot
meas_long %>%
st_drop_geometry() %>%
group_by(shapefile) %>%
summarise(meas_mean = mean(measurement),
meas_sd = sd(measurement))
#> # A tibble: 3 × 3
#> shapefile meas_mean meas_sd
#> <chr> <dbl> <dbl>
#> 1 shapefile_1.shp 10.1 0.913
#> 2 shapefile_2.shp 11.9 0.967
#> 3 shapefile_3.shp 14.1 0.950
# but we can pivot to wide as well; and perhaps drop geometry column
# if dataframe / tibble is desired and actual coordinates are not needed anymore:
meas_long %>%
st_drop_geometry() %>%
pivot_wider(names_from = "shapefile", values_from = "measurement")
#> # A tibble: 100 × 4
#> id shapefile_1.shp shapefile_2.shp shapefile_3.shp
#> <int> <dbl> <dbl> <dbl>
#> 1 1 9.44 11.3 16.2
#> 2 2 9.77 12.3 15.3
#> 3 3 11.6 11.8 13.7
#> 4 4 10.1 11.7 14.5
#> 5 5 10.1 11.0 13.6
#> 6 6 11.7 12.0 13.5
#> 7 7 10.5 11.2 13.2
#> 8 8 8.73 10.3 13.4
#> 9 9 9.31 11.6 15.7
#> 10 10 9.55 12.9 13.9
#> # ℹ 90 more rows
Created on 2023-06-24 with reprex v2.0.2
Upvotes: 0
Reputation: 12699
With {dplyr} you could do this:
library(dplyr)
Shapefile_1.shp <- data.frame(id = 1:3,
Measurement = c(10.5, 12.3, 9.8))
Shapefile_2.shp <- data.frame(id = 1:3,
Measurement = c(8.2, 11.7, 10.1 ))
left_join(Shapefile_1.shp, Shapefile_2.shp, by = "id") |>
rename(Shapefile_1 = Measurement.x, Shapefile_2 = Measurement.y)
#> id Shapefile_1 Shapefile_2
#> 1 1 10.5 8.2
#> 2 2 12.3 11.7
#> 3 3 9.8 10.1
Created on 2023-06-24 with reprex v2.0.2
Upvotes: 0