Reputation: 2854
I have some trouble importing a text file seperated by space / multiple spaces that also contains columns with strings that have spaces that should not be interpreted as seperators!
The table has no column names and a maximum of 9 columns. Column 6 is cmoprised by a string with spaces in it. Columns 4, 7, 8, 9 are optional and partly missing.
My idea was to use the fixed column widths when reading the table but was technically not able to achieve this..
Here is the file-url: ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/ghcnd-stations.txt
Because read.table throws an error,
> read.table("ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/ghcnd-stations.txt",sep="")
Fehler in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
Zeile 1 hatte keine 9 Elemente
I did this instead
lines <- readLines("ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/ghcnd-stations.txt")
Here's a sample of lines
:
a <- c("USC00080211 29.7258 -85.0206 6.1 FL APALACHICOLA AIRPORT HCN 72220",
"USC00080228 27.2181 -81.8739 9.1 FL ARCADIA HCN ",
"USC00080236 27.1819 -81.3508 42.7 FL ARCHBOLD BIO STN ",
"USC00080369 27.5947 -81.5267 46.9 FL AVON PARK 2 W ",
"USC00080374 27.6000 -81.5000 46.0 FL AVON PARK 1 NW ",
"USC00080390 27.8500 -81.5167 38.1 FL BABSON PARK 1 ENE ",
"USC00080414 24.6589 -81.2761 0.9 FL BAHIA HONDA SP ",
"USC00080478 27.8986 -81.8433 38.1 FL BARTOW HCN ",
"ACW00011604 17.1167 -61.7833 10.1 ST JOHNS COOLIDGE FLD ",
"ACW00011647 17.1333 -61.7833 19.2 ST JOHNS ",
"AE000041196 25.3330 55.5170 34.0 SHARJAH INTER. AIRP GSN 41196"
)
tf <- tempfile(fileext=".txt")
writeLines(a,tf)
shell.exec(tf)
#read.table(tf, sep = "", ??)
Upvotes: 0
Views: 1381
Reputation: 13103
I recently encountered an issue that matches the title of this post. Albeit my problem was much simpler than the one in the body of the original post. I could not find a solution to my issue anywhere. Given how much trouble this issue caused me I thought I would go ahead and post my solution here in the hope it might be helpful to others.
my.data1 = read.csv(text = '
1,10,40,30,X_ ,
2,80,90,20,Y_,
3,30,40,50,Z_ ,
', header = FALSE, stringsAsFactors = FALSE)
my.data1 <- my.data1[,1:(ncol(my.data1)-1)]
my.data2 = read.table(text = '
1 10 40 30 \'X_ \'
2 80 90 20 \'Y_\'
3 30 40 50 \'Z_ \'
', header = FALSE, stringsAsFactors = FALSE, sep="")
my.data2
# V1 V2 V3 V4 V5
#1 1 10 40 30 X_
#2 2 80 90 20 Y_
#3 3 30 40 50 Z_
all.equal(my.data1, my.data2)
#[1] TRUE
Upvotes: 1
Reputation: 2854
For the record:
Here's the solution I found thanks to @bdecaf..
lines <- readLines("ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/ghcnd-stations.txt")
w <- list(c(1,12), c(13,22), c(23,31), c(32,38), c(39,41), c(42,72), c(73,76), c(77,80), c(81,87))
ns <- c()
for(i in 1:length(w)) {
ns[i] <- paste("C", i, sep = ".")
assign(ns[i], str_trim(substring(lines, w[[i]][1], w[[i]][2])))
}
obj.list <- lapply(ns, get)
names(obj.list) <- ns
df <- data.frame(obj.list)
Upvotes: 1