RuffGriffin
RuffGriffin

Reputation: 13

"Error: Cannot open The file doesn't seem to exist." on an introductory st_read() example

I threw in some more reproducible code but the error is persisting.

Sorry in advance for the lack of a better question, the sample code is breaking before I have a change to dive into it. I've also never dealt with "./{rest of url}" before.is that the problem? I'm working with this https://programminghistorian.org/en/lessons/geospatial-data-analysis

and I'm getting this error "Error: Cannot open "./data/County1990ussm/"; The file doesn't seem to exist."

I've verified I'm in the intended working directory (one level above the data folder)

"We start by loading in the selected data. The data for this tutorial can be dowloaded here - https://programminghistorian.org/assets/geospatial-data-analysis/data.zip - . Once downloaded place all the files in a folder labeled data inside your working directory in R. We are going to create a variable and read in our data from our variable directory to it. Once run, the County_Aggregate_Data variable will contain the data and geographic information that we will analyze:"

library(sf)
library(tmap)
library(plotly)

setwd("path")
 aFile <- "https://programminghistorian.org/assets/geospatial-data-analysis/data.zip"

# check to see whether file exists before downloading and unzipping it
if(!file.exists("data.zip")) {
     download.file(aFile,"data.zip",mode="wb")
     unzip("data.zip")
}
 print(list.files("./data"))
County_Aggregate_Data <- st_read("./data/County1990ussm/")

OUTPUT

"[1] "County1990_Data"          "County1990ussm"          
[3] "DP_TableDescriptions.xls" "ExtendedZIP5.csv"        
[5] "GeocodedAddresses.csv"    "Religion
"
    "Error: Cannot open "./data/County1990ussm/" The file doesn't seem to exist."

Upvotes: 0

Views: 3464

Answers (1)

Len Greski
Len Greski

Reputation: 10855

If the original poster downloaded the zip file into the ./data directory and then unzipped it with default settings, the unzip creates another /data subdirectory.

Without seeing the contents of the original poster's data directory we can't tell whether this is the case. In any event, we can demonstrate how to download the file, unzip it and load one of its component files into R in a single script.

Here is a script that downloads the zip file from the website into the current R working directory, unzips it to ./data and reads the data. We use the mode="wb" argument in download.file() to tell R to use a binary download instead of a text download.

aFile <- "https://programminghistorian.org/assets/geospatial-data-analysis/data.zip"

# check to see whether file exists before downloading and unzipping it
if(!file.exists("data.zip")) {
     download.file(aFile,"data.zip",mode="wb")
     unzip("data.zip")
}

Having downloaded and unzipped the file, we can verify that its contents have been extracted to ./data with list.files().

# confirm that data is in the right directory
list.files("./data")

 > list.files("./data")
[1] "County1990_Data"          "County1990ussm"          
[3] "DP_TableDescriptions.xls" "ExtendedZIP5.csv"        
[5] "GeocodedAddresses.csv"    "Religion"                

Now that we've confirmed the presence of County1990ussm, we can load the file into memory using the code from the original post.

library(sf)
library(tmap)
library(plotly)
County_Aggregate_Data <- st_read("./data/County1990ussm/")
head(County_Aggregate_Data)

...and the output:

> head(County_Aggregate_Data)
Simple feature collection with 6 features and 20 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -1224327 ymin: -932167.3 xmax: 1843060 ymax: 1066589
Projected CRS: USA_Contiguous_Albers_Equal_Area_Conic
  DECADE NHGISNAM NHGISST NHGISCTY ICPSRST ICPSRCTY ICPSRNAM       STATENAM
1   1990     York     420     1330      14     1330     YORK   Pennsylvania
2   1990  Sherman     200     1810      32     1810  SHERMAN         Kansas
3   1990   Onslow     370     1330      47     1330   ONSLOW North Carolina
4   1990 Gallatin     300     0310      64      310 GALLATIN        Montana
5   1990    Ocean     340     0290      12      290    OCEAN     New Jersey
6   1990   Uvalde     480     4630      49     4630   UVALDE          Texas
  ICPSRSTI ICPSRCTYI ICPSRFIP STATE COUNTY  PID X_CENTROID Y_CENTROID  GISJOIN
1       14      1330        0   420   1330  936  1621651.3   436217.9 G4201330
2       32      1810        0   200   1810 1078  -488123.2   222198.8 G2001810
3       47      1330        0   370   1330 1114  1675585.4  -145184.9 G3701330
4       64       310        0   300   0310 1350 -1179798.5   996189.2 G3000310
5       12       290        0   340   0290 2426  1823185.7   479312.4 G3400290
6       49      4630        0   480   4630 2979  -365338.2  -901445.2 G4804630
  GISJOIN2 SHAPE_AREA SHAPE_LEN                       geometry
1  4201330 2357546914  252994.0 MULTIPOLYGON (((1617516 458...
2  2001810 2735057979  209726.2 MULTIPOLYGON (((-460646 244...
3  3701330 1993173332  723453.4 MULTIPOLYGON (((1680399 -17...
4  3000310 6559312252  558780.0 MULTIPOLYGON (((-1160194 10...
5  3400290 1704201820  649824.4 MULTIPOLYGON (((1833940 506...
6  4804630 4036829106  254728.5 MULTIPOLYGON (((-348764 -87...
> 

Another benefit of this approach is that the analysis is reproducible. That is, since the source data file is referenced in the script, unless the programming historian website is taken down one can conduct the analysis without already having the data file.

Upvotes: 2

Related Questions