Reputation: 301
I am trying to loop through a directory and read all of the files in a list. These files are all from the same github repo found here https://github.com/CSSEGISandData/COVID-19
path = "~/Documents/Corona_Virus/COVID-19/archived_data/archived_daily_case_updates/"
setwd(path)
file.names<-list.files(path)
archived_DAYS<-lapply(file.names,read.csv,sep=",",header=T)
goes off without a hitch, but then
path2 = "~/Documents/Corona_Virus/COVID-19/csse_covid_19_data/csse_covid_19_daily_reports/"
setwd(path2)
daily_file_names<-list.files(path2)
daily_DAYS<-lapply(daily_file_names,read.csv,sep=",")
throws the error
"Error in read.table(file = file, header = header, sep = sep, quote = quote, : no lines available in input"
however the types of files in both directories are .csv files that are all structured the same way. I don't see why it's throwing that error as every file has populated data
Upvotes: 2
Views: 6838
Reputation: 10855
To read the files locally in R, one can do the following.
At this point the current R working directory is the root directory of the cloned Github repository. The following code will retrieve all the daily archived files and read them into a list of data frames.
#
# archived days data
#
theFiles <- list.files("./archived_data/archived_daily_case_updates",pattern="*.csv",full.names = TRUE)
dataList <- lapply(theFiles,read.csv,stringsAsFactors=FALSE)
We can print the first few rows of data from the first data frame in the resulting list as follows.
> head(dataList[[1]])
ï..Province.State Country.Region Last.Update Confirmed Deaths Recovered Suspected
1 Anhui Mainland China 1/21/2020 10pm NA NA NA 3
2 Beijing Mainland China 1/21/2020 10pm 10 NA NA NA
3 Chongqing Mainland China 1/21/2020 10pm 5 NA NA NA
4 Guangdong Mainland China 1/21/2020 10pm 17 NA NA 4
5 Guangxi Mainland China 1/21/2020 10pm NA NA NA 1
6 Guizhou Mainland China 1/21/2020 10pm NA NA NA 1
>
Note that the full.names = TRUE
argument in list.files()
is needed to include the path in the resulting list of file names.
> # show path names in list of files
> head(theFiles)
[1] "./archived_data/archived_daily_case_updates/01-21-2020_2200.csv"
[2] "./archived_data/archived_daily_case_updates/01-22-2020_1200.csv"
[3] "./archived_data/archived_daily_case_updates/01-23-2020_1200.csv"
[4] "./archived_data/archived_daily_case_updates/01-24-2020_0000.csv"
[5] "./archived_data/archived_daily_case_updates/01-24-2020_1200.csv"
[6] "./archived_data/archived_daily_case_updates/01-25-2020_0000.csv"
>
The original poster asked why the code for the daily case updates failed in the comments to my answer. My hypothesis was that the existence of a README.md
file in the subdirectory caused read.csv()
to fail. Since my answer used pattern = '*.csv'
in list.files()
, it avoids reading a non-csv file with read.csv()
.
I ran the following code to test this hypothesis.
# replicate original error
originalDirectory <- getwd()
path2 =paste0(originalDirectory, "/csse_covid_19_data/csse_covid_19_daily_reports")
setwd(path2)
daily_file_names<-list.files(path2)
daily_DAYS<-lapply(daily_file_names,read.csv,sep=",")
I received the same error as documented in the original post.
> # replicate original error
> originalDirectory <- getwd()
> path2 =paste0(originalDirectory, "/csse_covid_19_data/csse_covid_19_daily_reports")
> setwd(path2)
> daily_file_names<-list.files(path2)
> daily_DAYS<-lapply(daily_file_names,read.csv,sep=",")
Error in read.table(file = file, header = header, sep = sep, quote = quote, :
no lines available in input
>
After adding pattern = '*.csv'
to list.files()
, the code works correctly.
> # use pattern = "*.csv"
> daily_file_names<-list.files(path2,pattern = "*.csv")
> daily_DAYS<-lapply(daily_file_names,read.csv,sep=",")
> head(daily_DAYS[[1]])
ï..Province.State Country.Region Last.Update Confirmed Deaths Recovered
1 Anhui Mainland China 1/22/2020 17:00 1 NA NA
2 Beijing Mainland China 1/22/2020 17:00 14 NA NA
3 Chongqing Mainland China 1/22/2020 17:00 6 NA NA
4 Fujian Mainland China 1/22/2020 17:00 1 NA NA
5 Gansu Mainland China 1/22/2020 17:00 NA NA NA
6 Guangdong Mainland China 1/22/2020 17:00 26 NA NA
>
Upvotes: 4