Reputation: 193
I'm sharing here a text file a need to read into R
https://www.filehosting.org/file/details/846022/PLT_PERIOD_ALL_175_P98.PLT
this is what I've done so far
fname<-'PLT_PERIOD_ALL_175_P98.PLT'
rt0<-read.table(fname, skip=8, header = FALSE,
stringsAsFactors = FALSE)
##Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
## line 84 did not have 10 elements
rt1<-read.table(fname, skip=8, header = FALSE,
stringsAsFactors = FALSE, fill=TRUE)
# this is messing up the last two columns by row 84
rt2<-read.table(fname, skip=8, header = FALSE,
flush = TRUE, stringsAsFactors = FALSE, fill=TRUE)
# this is skipping one column
rt3<-read.table(fname, skip=8, header = FALSE, sep='\t',
flush = TRUE, stringsAsFactors = FALSE, fill=TRUE)
# this is reading all in one column
Upvotes: 0
Views: 799
Reputation: 193
here some of my many failing attempts to read first two columns (just for proceeding with a simpler subset)
> read.fwf(fname, widths = c(12,14), skip = 8, sep=' ', dec='.')[1:5,]
V1 V2 V3 V4 V5
1 NA NA 739950.4 0 5083641
2 NA NA 739984.3 0 5083591
3 NA NA 740038.2 0 5083581
4 NA NA 740065.7 0 5083576
5 NA NA 740081.8 0 5083575
> read.fwf(fname, widths = c(15,16), skip = 8, dec='.')[1:5,]
V1 V2
1 739950.4 5083641
2 739984.3 5083591
3 740038.2 5083581
4 740065.7 5083576
5 740081.8 5083575
for some reasons I do not fully understand it seems I can't properly parse the second column (not to mention the others)
Upvotes: 0
Reputation: 263352
There are no separators. So the way to deal with this is to use fwf file logic. There appeared to be 9 lines in the preamble text but one of them was broken with a non-standard linefeed. The 8th line has underscores that define the column locations. The 84th line of data (and subsequent lines as well) has entries in the "NET ID" column that were not present in lines 7-9 above.
* X Y AVERAGE CONC ZELEV ZHILL ZFLAG AVE GRP RANK NET ID DATE(CONC)
* ____________ ____________ ____________ ______ ______ ______ ______ ________ ________ ________ ________
739950.40000 5083641.40000 12.01732 124.31 1564.68 0.00 1-HR ALL 175TH 19120324
---snipped lines------
This is what you see after your read.table( ..., fill=TRUE)
operation with the added data in the 10th column with value "CAR1":
83 739277.6 5083387 0.78225 123.46 1564.68 0 1-HR ALL 175TH 19010108
84 739061.1 5082586 0.34595 122.50 1564.70 0 1-HR ALL 175TH CAR1
85 19052321.0 NA NA NA NA NA
86 739161.1 5082586 0.39795 122.00 1564.70 0 1-HR ALL 175TH CAR1
Here's a trick to create a "ruler" for determining column positions. You can figure out the widths of those underscore columns (including the 2 positions at the beginning and the 3 position gaps using this output:
rt1<-readLines("~/Downloads/PLT_PERIOD_ALL_175_P98.PLT",n=9 ) # first 8 lines
rt2 <- paste0( rep(c(1:9,0),12), collapse="") # one's place ruler
rt2[2] <- paste0( tail( rep( c(0:9, 0:1) ,each=10),-1 ),collapse="") # ten's place ruler
> rt2
[1] "123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890"
[2] "00000000011111111112222222222333333333344444444445555555555666666666677777777778888888888999999999900000000001111111111"
> rt1[8]
[1] "* ____________ ____________ ____________ ______ ______ ______ ______ ________ ________ ________ ________"
And then read:
> ?read.fwf
You read in all the data using the widths.
Upvotes: 2