Reputation: 401
I want to read in a table from a NOAA file hosted online. The file is a list of stations in various cities. My trouble is having a way of reading the data in. The columns do not seem to be separated consistently. This means I have to turn the fill
option to true which ends up with multiple word cities ending up in different columns. This is clearly not what I want but I cannot see a solution that can correct for it. Is there any way to specify maybe the last few columns all to be read in together as one column? Or perhaps I shouldn't use read.table and perhaps something else altogether? Any help is appreciated!
The code is below.
url <- "ftp://ftp.ncdc.noaa.gov/pub/data/normals/1981-2010/station-inventories/temp-inventory.txt"
stations <- read.table(url, header=FALSE, skip=2, fill=TRUE, nrows = 5,
col.names = c("ID","lat","lon","UNK","State","City","UNK2","UNK3","UNK4")
)
stations
ID lat lon UNK State City UNK2 UNK3 UNK4
1 CQC00914080 15.2136 145.7497 252.1 MP CAPITOL HILL 1 TRADITIONAL
2 CQC00914801 14.1717 145.2428 179.2 MP ROTA AP 91221 TRADITIONAL
3 FMC00914395 5.3544 162.9533 2.1 FM KOSRAE 91355 TRADITIONAL
4 FMC00914419 5.5167 153.8167 1.5 FM LUKUNOCH TRADITIONAL
5 FMC00914446 9.6053 138.1786 14.9 FM MAAP TRADITIONAL
The original source with the relevant lines is this:
CQC00914080 15.2136 145.7497 252.1 MP CAPITOL HILL 1 TRADITIONAL
CQC00914801 14.1717 145.2428 179.2 MP ROTA AP 91221 TRADITIONAL
FMC00914395 5.3544 162.9533 2.1 FM KOSRAE 91355 TRADITIONAL
FMC00914419 5.5167 153.8167 1.5 FM LUKUNOCH TRADITIONAL
FMC00914446 9.6053 138.1786 14.9 FM MAAP TRADITIONAL
Upvotes: 2
Views: 660
Reputation: 93938
Looks like a fixed-width file, which can be appropriately processed using ?read.fwf
. Here's the full line that seems to work to import the file:
read.fwf(url, widths=c(11,9,10,7,4,31,3,10,13), strip.white=TRUE, comment.char="")
The comment.char=""
is necessary because there are #
characters within the text file, which are interpreted as comment characters by R. This makes certain lines throw an error as it doesn't find all the columns it needs to.
Upvotes: 3
Reputation: 19544
It works fine with read_table
from package readr
:
readr::read_table(url, skip=2, n_max = 5,col_names=FALSE)
cols(
X1 = col_character(),
X2 = col_double(),
X3 = col_double(),
X4 = col_double(),
X5 = col_character(),
X6 = col_character(),
X7 = col_character(),
X8 = col_character(),
X9 = col_integer(),
X10 = col_character()
)
# A tibble: 5 × 10
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
<chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <int> <chr>
1 CQC00914080 15.2136 145.7497 252.1 MP CAPITOL HILL 1 NA TRADITIONAL
2 CQC00914801 14.1717 145.2428 179.2 MP ROTA AP 91221 TRADITIONAL
3 FMC00914395 5.3544 162.9533 2.1 FM KOSRAE 91355 TRADITIONAL
4 FMC00914419 5.5167 153.8167 1.5 FM LUKUNOCH NA TRADITIONAL
5 FMC00914446 9.6053 138.1786 14.9 FM MAAP NA TRADITIONAL
Upvotes: 2