Jonathan
Jonathan

Reputation: 65

Is one day (20140319.export.CSV.zip) of data missing from the GDELT event files?

I am dealing with GDELT data using R and the {GDELTtools} package.

When downloading the GDELT database using GetAllOfGDELT() or through a web browser, it appears that one file (20140319.export.CSV.zip) is missing. This causes GetAllOfGDELT() to fail and also creates problems for subsequent data analyses.

Questions: Is this a temporary issue? Has anyone else run into the same issue?

Here is the related code and output:

> # Download the entire GDELT database
> GetAllOfGDELT(local.folder = "./Data",
+               data.url.root = "http://data.gdeltproject.org/events/", 
+               force = FALSE)
The compressed GDELT data set is currently 12.3GB. It will take a long time to download and
requires a lot of room (12.3GB) where you store it. Please verify that you have sufficient free
space on the drive where you intend to store it.
Are you ready to proceed? (y/n) y
Downloading or verifying 1979.zip succeeded.
Downloading or verifying 1980.zip succeeded.
...
Downloading or verifying 20140317.export.CSV.zip succeeded.
Downloading or verifying 20140318.export.CSV.zip succeeded.
trying URL 'http://data.gdeltproject.org/events/20140319.export.CSV.zip'
Error in download.file(url = paste(data.url.root, f, sep = ""), destfile = paste(local.folder,  : 
  cannot open URL 'http://data.gdeltproject.org/events/20140319.export.CSV.zip'
In addition: Warning message:
In download.file(url = paste(data.url.root, f, sep = ""), destfile = paste(local.folder,  :
  cannot open: HTTP status was '404 Not Found'
>

Here is how the online "All GDELT Event Files" directory listing looks:

20140321.export.CSV.zip (9.9MB) (MD5: d492ca38db3c8f40b657b0eb2415f950)
20140320.export.CSV.zip (10.6MB) (MD5: 8602497fdc0f54861c056d33fb64f3b8)
20140318.export.CSV.zip (10.7MB) (MD5: cf0c2a30b09cdbc28204eb0eca53db1e)
20140317.export.CSV.zip (9.8MB) (MD5: 61e70e4ff79e590abddd6f26f8dfa552)

Source: http://data.gdeltproject.org/events/index.html

One partial workaround is provided below, but it only solves the problem of downloading the remaining post-2014/03/19 event files.

# Download the entire post-20140319 GDELT database
GetGDELT(start.date = "2014/03/20", 
         end.date = "2015/01/01", 
         local.folder = "./Data", 
         data.url.root = "http://data.gdeltproject.org/events/",
         verbose = TRUE)

Note: There are 0 results on Google for "20140319.export.CSV.zip", but useful results do appear for other files.

Upvotes: 1

Views: 348

Answers (0)

Related Questions