Reputation: 2763
I am using readLines
to read the content of the following text file:
*--------------------------------------------------------------------*
* 7. Measured data *
* And option to force measured LAI during simulation *
* (instead of using simulated values) *
*--------------------------------------------------------------------*
* Observed phenology: only required if program DRATES is run!!
IDOYTR = 194 ! Day of transplanting (give 0 if direct-seeded)
IYRTR = 1991 ! Year of transplanting (give 0 if direct-seeded)
IDOYPI = 240 ! Day of panicle initiation (give -99 if not observed)
IYRPI = 1991 ! Year of panicle initiation (give -99 if not observed)
IDOYFL = 260 ! Day of flowering
IYRFL = 1991 ! Year of flowering
IDOYM = 288 ! Day of maturity
IYRM = 1991 ! Year of maturity
*Leaf Area Index (m2 leaf / m2 ground):
LAI_OBS =
1991., 182., 0.00 ,
1991., 194., 0.028,
1991., 202., 0.185,
1991., 211., 0.325,
1991., 219., 1.048,
1991., 240., 3.680,
1991., 254., 5.010,
1991., 260., 4.628,
1991., 273., 3.520,
1991., 288., 1.938
*-- Parameter to set forcing of observed LAI during simulation
LAI_FRC = 0 ! No forcing
*LAI_FRC = 2 ! Forcing
And I need to programmatically extract only the block of text identified by LAI_OBS =
. The line number where LAI_OBS =
is located varies from file to file. Therefore, I need to find a way to read all the text between the string LAI_OBS =
and the next blank line.
So far I am using:
l <- readLines('file.txt')
which(obs.lai=='LAI_OBS =')
I can identify the initial line of the block I need to extract, but I don't know how to instruct R to go to the first blank line after LAI_OBS =
.
The result I need is a data frame looking like this:
1991 182 0.00
1991 194 0.028
1991 202 0.185
1991 211 0.325
1991 219 1.048
1991 240 3.680
1991 254 5.010
1991 260 4.628
1991 273 3.520
1991 288 1.938
What is a convenient way to do this in R? Thanks.
Upvotes: 2
Views: 1104
Reputation: 395
This works, not elegant but gets the job done:
l <- readLines('data.txt')
first <- which(l=='LAI_OBS =')
blanks <- which(l=='')
whichblank <- which(which(l=='') > first)
last <- blanks[whichblank]
first
last
outputs:
[1] 18 [1] 29
Of course if there are more blank lines in the file you would just grab the first from whichblank
Upvotes: 1
Reputation: 887241
Get the index of the "LAI_OBS" (it looks like ==
can be used for in case if it is not a fixed case, then grep
is more useful. Then, get the index of blank elements with nzchar
, select the first empty index which is greater than 'i1', get the sequence from 'i1' to 'i2' (after making adjustments i.e. adding 1 and subtracting 1), remove the extra characters using sub/gsub
and read with read.csv
i1 <- grep("LAI_OBS =", l)+1
i2 <- which(!nzchar(l))
i3 <- i2[i2>i1][1]-1
read.csv(text=gsub("\\.,", ",", sub("\\s*,$", "", l[i1:i3])), header=FALSE)
# V1 V2 V3
#1 1991 182 0.000
#2 1991 194 0.028
#3 1991 202 0.185
#4 1991 211 0.325
#5 1991 219 1.048
#6 1991 240 3.680
#7 1991 254 5.010
#8 1991 260 4.628
#9 1991 273 3.520
#10 1991 288 1.938
Upvotes: 4
Reputation: 521579
From what I gather, the tricky part about your input file is being able to articulate where the input data ends. One approach is to continue down your current path and use which
again to match the following line:
*-- Parameter to set forcing of observed LAI during simulation
idx1 <- which(obs.lai=='LAI_OBS =')
idx2 <- which(substring(obs.lai, 1, 20) == '*-- Parameter to set')
df.keep <- obs.lai[idx1:idx2-1, ]
Note that if the file has multiple lines beginning with the 20 characters I attempt to match, you might have to increase the length of the substring. My hunch is that the full line would be unique because it refers to LAI
simulation.
Upvotes: 2