Michael Henry
Michael Henry

Reputation: 631

Using sf in R - what is the best way to add geometry to large point dataset?

Disclaimer: I have only started using sf so I might be (hopefully!) missing something obvious here.

I have the AusGeoid2020 data which consists of 15,454,800 points and some attributes to convert between ellipsoidal heights (i.e. GPS height) and the AHD.

Although the file is large (914Mb) it's easy enough to read in:

library(plyr)
library(magrittr)
library(dplyr)
library(readr)
library(sf)

AusGeoid2020 <- read_fwf(
  file = "AUSGeoid2020_20170908_win.dat",
  col_positions = fwf_widths(
    widths = c(3L,9L,2L,2L,3L,7L,2L,3L,3L,7L,10L,10L),
    col_names = c(
      "ID",
      "ellipsoid to AHD separation (m)",
      "Latitude (hem)",
      "Latitude (deg)",
      "Latitude (min)",
      "Latitude (sec)",
      "Longitude (hem)",
      "Longitude (deg)",
      "Longitude (min)",
      "Longitude (sec)",
      "deflection of the vertical (seconds, xi)",
      "deflection of the vertical (seconds, eta)"
    )
  ),
  col_types = cols(
    ID = col_character(),
    `ellipsoid to AHD separation (m)` = col_double(),
    `Latitude (hem)` = col_character(),
    `Latitude (deg)` = col_double(),
    `Latitude (min)` = col_double(),
    `Latitude (sec)` = col_double(),
    `Longitude (hem)` = col_character(),
    `Longitude (deg)` = col_double(),
    `Longitude (min)` = col_double(),
    `Longitude (sec)` = col_double(),
    `deflection of the vertical (seconds, xi)` = col_double(),
    `deflection of the vertical (seconds, eta)` = col_double()
  ),
  skip = 1L
)

AusGeoid2020 <- AusGeoid2020 %>% 
  mutate(
    Latitude = `Latitude (deg)` + (`Latitude (min)`/60) + (`Latitude (sec)`/3600),
    Latitude = case_when(
      `Latitude (hem)` == "S" ~ -1 * Latitude,
      TRUE ~ Latitude
    ),
    Longitude = `Longitude (deg)` + (`Longitude (min)`/60) + (`Longitude (sec)`/3600),
    Longitude = case_when(
      `Longitude (hem)` == "W" ~ -1 * Longitude,
      TRUE ~ Longitude
    )
  ) %>% 
  select(
    ID,
    `ellipsoid to AHD separation (m)`,
    Latitude,
    Longitude,
    `deflection of the vertical (seconds, xi)`,
    `deflection of the vertical (seconds, eta)`
  )

My question is: what is the best way to add geometry to this large data frame? I believe the function I want is st_point() which is not vectorised, so I have resorted to using alply() from {plyr} to create the geometry column, but this is very resource intensive which makes me think there must be a better way.

st_geometry(AusGeoid2020) <- st_sfc(
  alply(AusGeoid2020, 1, function(row) {
    st_point(x = c(row$Longitude, row$Latitude), dim = "XY")
  }),
  crs = 7844L
)

This takes a very long time. Any advice appreciated!

Upvotes: 2

Views: 1860

Answers (1)

www
www

Reputation: 39154

We can use st_as_sf as follows. The default setting would remove the columns with coordinate information (in this case, Longitude and Latitude). If you want to keep those columns, set remove = FALSE.

AusGeoid2020_sf <- AusGeoid2020 %>% 
  st_as_sf(coords = c("Longitude", "Latitude"), crs = 7844L, remove = FALSE)

Upvotes: 3

Related Questions