Jerry07
Jerry07

Reputation: 941

How can I read the file which doesn't have Filename extension in R?

I am dealing with climate dataset in R, where I downloaded yearly Temp/Precip observation by the globe from here: climate data archive, and example datasets can be found yearly temperature data for all countires and another one is yearly precipitation data for all countries. However, the format of this data doesn't have fileName extension, and its respective filename extension was forgotten or missing. I tried base::scan() to load them in R, but the output is not desired. Because each file must have 14 fixed columns, but if I used scan(), it will read 7 column only, which is not desired for me. Is there any better function for reading the file without specific filename extension? Any idea?

Here is how list of climate data looks like:

list.files("stella/data/air_temp_1980_2014/", recursive = TRUE)

 [1] "air_temp.1980" "air_temp.1981" "air_temp.1982" "air_temp.1983"
 [5] "air_temp.1984" "air_temp.1985" "air_temp.1986" "air_temp.1987"
 [9] "air_temp.1988" "air_temp.1989" "air_temp.1990" "air_temp.1991"
[13] "air_temp.1992" "air_temp.1993" "air_temp.1994" "air_temp.1995"
[17] "air_temp.1996" "air_temp.1997" "air_temp.1998" "air_temp.1999"
[21] "air_temp.2000" "air_temp.2001" "air_temp.2002" "air_temp.2003"
[25] "air_temp.2004" "air_temp.2005" "air_temp.2006" "air_temp.2007"
[29] "air_temp.2008" "air_temp.2009" "air_temp.2010" "air_temp.2011"
[33] "air_temp.2012" "air_temp.2013" "air_temp.2014"

Here is how scan() produce its output:

>scan(file = "stella/data/air_temp_1980_2014/air_temp.1980", sep = "", skip = 1)

[1] -179.75   68.75  -27.00  -28.20  -27.20  -21.60   -9.00
   [8]    0.60    2.80    1.90   -0.20  -11.90  -22.70  -25.10
  [15] -179.75   68.25  -27.80  -28.50  -27.50  -22.00   -9.50
  [22]    0.40    3.00    1.80   -0.80  -12.70  -23.60  -26.80
  [29] -179.75   67.75  -26.80  -26.60  -25.70  -20.50   -8.00
  [36]    2.70    6.00    4.00    0.50  -12.20  -23.20  -27.30
  [43] -179.75   67.25  -29.10  -28.40  -27.50  -22.30   -9.70
  [50]    2.20    6.20    3.30   -1.30  -15.40  -26.40  -31.10
  [57] -179.75   66.75  -25.40  -23.80  -22.90  -18.20   -6.10
  [64]    3.80    8.60    6.00    1.10  -11.50  -22.30  -27.20

Desired output:

> desired output
         Long    Lat   Jan   Feb   Mar April   May   Jun   Jul
1     -179.75  68.75 -27.0 -28.2 -27.2 -21.6  -9.0   0.6   2.8
2     -179.75  68.25 -27.8 -28.5 -27.5 -22.0  -9.5   0.4   3.0
3     -179.75  67.75 -26.8 -26.6 -25.7 -20.5  -8.0   2.7   6.0
4     -179.75  67.25 -29.1 -28.4 -27.5 -22.3  -9.7   2.2   6.2
5     -179.75  66.75 -25.4 -23.8 -22.9 -18.2  -6.1   3.8   8.6
6     -179.75  66.25 -21.5 -18.9 -17.2 -14.0  -2.3   3.4   9.2
7     -179.75  65.75 -20.2 -17.9 -17.1 -13.2  -2.2   4.3  10.1
8     -179.75  65.25 -20.0 -18.7 -17.4 -14.1  -2.4   4.3  10.5
9     -179.75 -16.75  27.4  28.3  27.9  27.2  25.7  24.9  24.7
10    -179.75 -84.75 -18.9 -27.9 -38.6 -41.5 -41.2 -44.4 -45.2
11    -179.75 -85.25 -23.9 -33.8 -45.1 -47.9 -47.7 -50.4 -51.5
12    -179.75 -85.75 -22.8 -33.5 -45.2 -48.1 -47.7 -49.9 -51.4
13    -179.75 -86.25 -24.3 -35.5 -47.7 -50.6 -50.2 -52.1 -53.8
14    -179.75 -86.75 -25.5 -37.1 -49.6 -52.6 -52.1 -53.8 -55.7
15    -179.75 -87.25 -26.2 -38.1 -50.9 -53.8 -53.2 -54.8 -56.8
16    -179.75 -87.75 -26.7 -39.0 -51.9 -54.8 -54.3 -55.7 -57.9

I want to read all list of files in R. How can I read above data correctly in R as I expected? Any idea?

Upvotes: 1

Views: 2467

Answers (2)

cparmstrong
cparmstrong

Reputation: 819

It appears to be a tab delimited file (confirm by adding .txt extension). If you add a .csv extension to each of the files and then read them in explicitly using whitespace as the delimiter, it should work fine. This may be tedious but is likely your best option, as files without an appropriate extension are confusing in their own right.

Be cautious though because the column names are not preserved. To avoid the first row getting stored as column names you need to pass a vector of names to the function as well.

name_vector <- c("Long", "Lat", ... )

x <- read.csv("path/precip.1980.csv", sep = "", col.names = name_vector)

edit:

since you've already scanned the data in, you should be able to just paste a ".csv" to the end of each element in the file list vector instead of having to do it manually. However, read.csv() will not work without an extension so it has to be done at some point.

# store file list
filelist <- list.files("stella/data/air_temp_1980_2014/", recursive = TRUE)

# paste extension
filelist <- paste0(filelist, ".csv")

Then you could iteratively read the files in with my code from above. Here's an example of a solution that could work. do that for you.

dat <- lapply(filelist, function (x) {
   read.csv(x, sep = "", col.names = name_vector)
})

I haven't explicitly tested this solution, and it still likely will present errors because of the column names issue. If you'd provide a proper reprex it would be much easier to troubleshoot these issues for you.

Upvotes: 1

JAD
JAD

Reputation: 2180

The filename extension in itself doesn't mean much. It is there to signify how the data in the file is ordered. You should open the file in a text-editor to figure out how it is represented.

From the looks of it, and according to the other answer, it might be a tab-delimited csv file. So the way to import it into R is to use the CSV-related input functions, like read.csv or data.table::fread.

Upvotes: 2

Related Questions