Reputation: 13103
I am trying to extract tables from text files and have found several earlier posts here that address similar questions. However, none seem to work efficiently with my problem. The most helpful answer I have found is to one of my earlier questions here: R: removing header, footer and sporadic column headings when reading csv file
An example dummy text file contains:
>
>
> ###############################################################################
>
> # Display AICc Table for the models above
>
>
> collect.models(, adjust = FALSE)
model npar AICc DeltaAICc weight Deviance
13 P1 19 94 0.00 0.78 9
12 P2 21 94 2.64 0.20 9
10 P3 15 94 9.44 0.02 9
2 P4 11 94 619.26 0.00 9
>
>
> ###############################################################################
>
> # the three lines below count the number of errors in the code above
>
> cat("ERROR COUNT:", .error.count, "\n")
ERROR COUNT: 0
> options(error = old.error.fun)
> rm(.error.count, old.error.fun, new.error.fun)
>
> ##########
>
>
I have written the following code to extract the desired table:
my.data <- readLines('c:/users/mmiller21/simple R programs/dummy.log')
top <- '> collect.models\\(, adjust = FALSE)'
bottom <- '> # the three lines below count the number of errors in the code above'
my.data <- my.data[-c(grep(bottom, my.data):length(my.data))]
my.data <- my.data[-c(1:grep(top, my.data))]
my.data <- my.data[c(1:(length(my.data)-4))]
aa <- as.data.frame(my.data)
aa
write.table(my.data, 'c:/users/mmiller21/simple R programs/dummy.log.extraction.txt', quote=F, col.names=F, row.name=F)
my.data2 <- read.table('c:/users/mmiller21/simple R programs/dummy.log.extraction.txt', header = TRUE, row.names = c(1))
my.data2
model npar AICc DeltaAICc weight Deviance
13 P1 19 94 0.00 0.78 9
12 P2 21 94 2.64 0.20 9
10 P3 15 94 9.44 0.02 9
2 P4 11 94 619.26 0.00 9
I would prefer to avoid having to write and then read my.data
to obtain the desired data frame. Prior to that step the current code returns a vector of strings for my.data
:
[1] " model npar AICc DeltaAICc weight Deviance" "13 P1 19 94 0.00 0.78 9"
[3] "12 P2 21 94 2.64 0.20 9" "10 P3 15 94 9.44 0.02 9"
[5] "2 P4 11 94 619.26 0.00 9"
Is there some way I can convert the above vector of strings into a data frame like that in dummy.log.extraction.txt
without writing and then reading my.data
?
The line:
aa <- as.data.frame(my.data)
returns the following, which looks like what I want:
# my.data
# 1 model npar AICc DeltaAICc weight Deviance
# 2 13 P1 19 94 0.00 0.78 9
# 3 12 P2 21 94 2.64 0.20 9
# 4 10 P3 15 94 9.44 0.02 9
# 5 2 P4 11 94 619.26 0.00 9
However:
dim(aa)
# [1] 5 1
If I can split aa
into columns then I think I will have what I want without having to write and then read my.data
.
I found the post: Extracting Data from Text Files However, in the posted answer the table in question seems to have a fixed number of rows. In my case the number of rows can vary between 1 and 20. Also, I would prefer to use base R
. In my case I think the number of rows between bottom
and the last row of the table is a constant (here 4).
I also found the post: How to extract data from a text file using R or PowerShell? However, in my case the column widths are not fixed and I do not know how to split the strings (or rows) so there are only seven columns.
Given all of the above perhaps my question is really how to split the object aa
into columns. Thank you for any advice or assistance.
EDIT:
The actual logs are produced by a supercomputer and contain up to 90,000 lines. However, the number of lines varies greatly among logs. That is why I was making use of top
and bottom
.
Upvotes: 1
Views: 4364
Reputation: 13103
Thank you to those who posted answers. Because of the size, complexity and variability of the actual log files I think I need to continue to make use of the variables top
and bottom
. However, I used elements of dickoa's answer to come up with the following.
my.data <- readLines('c:/users/mmiller21/simple R programs/dummy.log')
top <- '> collect.models\\(, adjust = FALSE)'
bottom <- '> # the three lines below count the number of errors in the code above'
my.data <- my.data[-c(grep(bottom, my.data):length(my.data))]
my.data <- my.data[-c(1:grep(top, my.data))]
x <- read.table(text=my.data, comment.char = ">")
x
# model npar AICc DeltaAICc weight Deviance
# 13 P1 19 94 0.00 0.78 9
# 12 P2 21 94 2.64 0.20 9
# 10 P3 15 94 9.44 0.02 9
# 2 P4 11 94 619.26 0.00 9
Here is even simpler code:
my.data <- readLines('c:/users/mmiller21/simple R programs/dummy.log')
top <- '> collect.models\\(, adjust = FALSE)'
bottom <- '> # the three lines below count the number of errors in the code above'
my.data <- my.data[grep(top, my.data):grep(bottom, my.data)]
x <- read.table(text=my.data, comment.char = ">")
x
Upvotes: 0
Reputation: 18437
May be your real log file is totally different and more complex but with this one, you can use read.table
directly, you just have to play with the right parameters.
data <- read.table("c:/users/mmiller21/simple R programs/dummy.log",
comment.char = ">",
nrows = 4,
skip = 1,
header = TRUE,
row.names = 1)
str(data)
## 'data.frame': 4 obs. of 6 variables:
## $ model : Factor w/ 4 levels "P1","P2","P3",..: 1 2 3 4
## $ npar : int 19 21 15 11
## $ AICc : int 94 94 94 94
## $ DeltaAICc: num 0 2.64 9.44 619.26
## $ weight : num 0.78 0.2 0.02 0
## $ Deviance : int 9 9 9 9
data
## model npar AICc DeltaAICc weight Deviance
## 13 P1 19 94 0.00 0.78 9
## 12 P2 21 94 2.64 0.20 9
## 10 P3 15 94 9.44 0.02 9
## 2 P4 11 94 619.26 0.00 9
Upvotes: 3
Reputation: 121568
It looks strange that you have to read an R console. Whatever, you can use the fact that your table lines begin with a numeric and extract your inetersting line using something like ^[0-9]+
. Then read.table
like shown by @kohske do the rest.
readLines('c:/users/mmiller21/simple R programs/dummy.log')
idx <- which(grepl('^[0-9]+',ll))
idx <- c(min(idx)-1,idx) ## header line
read.table(text=ll[idx])
model npar AICc DeltaAICc weight Deviance
13 P1 19 94 0.00 0.78 9
12 P2 21 94 2.64 0.20 9
10 P3 15 94 9.44 0.02 9
2 P4 11 94 619.26 0.00 9
Upvotes: 1
Reputation: 66842
read.table
and its family now have an option to read text:
> df <- read.table(text = paste(my.data, collapse = "\n"))
> df
model npar AICc DeltaAICc weight Deviance
13 P1 19 94 0.00 0.78 9
12 P2 21 94 2.64 0.20 9
10 P3 15 94 9.44 0.02 9
2 P4 11 94 619.26 0.00 9
> summary(df)
model npar AICc DeltaAICc weight Deviance
P1:1 Min. :11.0 Min. :94 Min. : 0.00 Min. :0.000 Min. :9
P2:1 1st Qu.:14.0 1st Qu.:94 1st Qu.: 1.98 1st Qu.:0.015 1st Qu.:9
P3:1 Median :17.0 Median :94 Median : 6.04 Median :0.110 Median :9
P4:1 Mean :16.5 Mean :94 Mean :157.84 Mean :0.250 Mean :9
3rd Qu.:19.5 3rd Qu.:94 3rd Qu.:161.90 3rd Qu.:0.345 3rd Qu.:9
Max. :21.0 Max. :94 Max. :619.26 Max. :0.780 Max. :9
Upvotes: 3