RHinks
RHinks

Reputation: 41

How to load geojson data from a FeatureCollection into a data.frame?

Hi I'm looking to bring a geojson FeatureCollection from the UK's Office of National Statistics API into a data.frame using the httr package.

library(httr)

HealthGeog <- GET("https://opendata.arcgis.com/datasets/f0095af162f749ad8231e6226e1b7e30_0.geojson")

And getting a successful response:

> HealthGeog
Response [https://opendata.arcgis.com/datasets/f0095af162f749ad8231e6226e1b7e30_0.geojson]
  Date: 2018-11-21 13:28
  Status: 200
  Content-Type: application/json; charset=utf-8
  Size: 9.59 MB

But being new to working with JSON, not sure how to navigate to the list within the FeatureCollection and load this into a data.frame?

Upvotes: 1

Views: 393

Answers (1)

hrbrmstr
hrbrmstr

Reputation: 78792

We can use R spatial tools to read it but see the section after this one on why you might not need to:

library(sf)
library(tidyverse)

health_geog_url <- "https://opendata.arcgis.com/datasets/f0095af162f749ad8231e6226e1b7e30_0.geojson"

# don't be one of 'those people' and waste bandwidth that isn't yours:
httr::GET(
  url = health_geog_url,
  httr::write_disk(basename(health_geog_url)),
  httr::progress()
)

health_geog <- st_read(basename(health_geog_url))
## Reading layer `OGRGeoJSON' from data source `/Users/bob/Desktop/f0095af162f749ad8231e6226e1b7e30_0.geojson' using driver `GeoJSON'
## replacing null geometries with empty geometries
## Simple feature collection with 32844 features and 10 fields (with 32844 geometries empty)
## geometry type:  GEOMETRYCOLLECTION
## dimension:      XY
## bbox:           xmin: NA ymin: NA xmax: NA ymax: NA
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs

health_geog
## Simple feature collection with 32844 features and 10 fields (with 32844 geometries empty)
## geometry type:  GEOMETRYCOLLECTION
## dimension:      XY
## bbox:           xmin: NA ymin: NA xmax: NA ymax: NA
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
## First 10 features:
##     LSOA11CD       LSOA11NM   CCG18CD CCG18CDH           CCG18NM   STP18CD
## 1  E01011388     Leeds 019B E38000225      15F     NHS Leeds CCG E54000005
## 2  E01011865 Wakefield 042D E38000190      03R NHS Wakefield CCG E54000005
## 3  E01011833 Wakefield 025E E38000190      03R NHS Wakefield CCG E54000005
## 4  E01011390     Leeds 087A E38000225      15F     NHS Leeds CCG E54000005
## 5  E01011866 Wakefield 045B E38000190      03R NHS Wakefield CCG E54000005
## 6  E01011834 Wakefield 015A E38000190      03R NHS Wakefield CCG E54000005
## 7  E01011391     Leeds 087B E38000225      15F     NHS Leeds CCG E54000005
## 8  E01011867 Wakefield 042E E38000190      03R NHS Wakefield CCG E54000005
## 9  E01011835 Wakefield 012A E38000190      03R NHS Wakefield CCG E54000005
## 10 E01011392     Leeds 087C E38000225      15F     NHS Leeds CCG E54000005
##           STP18NM   LAD18CD   LAD18NM  FID                 geometry
## 1  West Yorkshire E08000035     Leeds 1001 GEOMETRYCOLLECTION EMPTY
## 2  West Yorkshire E08000036 Wakefield 1002 GEOMETRYCOLLECTION EMPTY
## 3  West Yorkshire E08000036 Wakefield 1003 GEOMETRYCOLLECTION EMPTY
## 4  West Yorkshire E08000035     Leeds 1004 GEOMETRYCOLLECTION EMPTY
## 5  West Yorkshire E08000036 Wakefield 1005 GEOMETRYCOLLECTION EMPTY
## 6  West Yorkshire E08000036 Wakefield 1006 GEOMETRYCOLLECTION EMPTY
## 7  West Yorkshire E08000035     Leeds 1007 GEOMETRYCOLLECTION EMPTY
## 8  West Yorkshire E08000036 Wakefield 1008 GEOMETRYCOLLECTION EMPTY
## 9  West Yorkshire E08000036 Wakefield 1009 GEOMETRYCOLLECTION EMPTY
## 10 West Yorkshire E08000035     Leeds 1010 GEOMETRYCOLLECTION EMPTY

This seems to be a GeoJSON file with no geometries so that likely means it's really just "data". Many of those opendata.arcgis.com endpoints also have a CSV option and this ones does:

health_geog_url_csv <- "https://opendata.arcgis.com/datasets/f0095af162f749ad8231e6226e1b7e30_0.csv"

httr::GET(
  url = health_geog_url_csv,
  httr::write_disk(basename(health_geog_url_csv)),
  httr::progress()
)

read_csv(basename(health_geog_url_csv))
## # A tibble: 32,844 x 10
##    LSOA11CD  LSOA11NM  CCG18CD  CCG18CDH CCG18NM  STP18CD STP18NM  LAD18CD
##    <chr>     <chr>     <chr>    <chr>    <chr>    <chr>   <chr>    <chr>  
##  1 E01011388 Leeds 01… E380002… 15F      NHS Lee… E54000… West Yo… E08000…
##  2 E01011865 Wakefiel… E380001… 03R      NHS Wak… E54000… West Yo… E08000…
##  3 E01011833 Wakefiel… E380001… 03R      NHS Wak… E54000… West Yo… E08000…
##  4 E01011390 Leeds 08… E380002… 15F      NHS Lee… E54000… West Yo… E08000…
##  5 E01011866 Wakefiel… E380001… 03R      NHS Wak… E54000… West Yo… E08000…
##  6 E01011834 Wakefiel… E380001… 03R      NHS Wak… E54000… West Yo… E08000…
##  7 E01011391 Leeds 08… E380002… 15F      NHS Lee… E54000… West Yo… E08000…
##  8 E01011867 Wakefiel… E380001… 03R      NHS Wak… E54000… West Yo… E08000…
##  9 E01011835 Wakefiel… E380001… 03R      NHS Wak… E54000… West Yo… E08000…
## 10 E01011392 Leeds 08… E380002… 15F      NHS Lee… E54000… West Yo… E08000…
## # ... with 32,834 more rows, and 2 more variables: LAD18NM <chr>,
## #   FID <int>

I'd use the CSV option.

Upvotes: 1

Related Questions