Mark Miller
Mark Miller

Reputation: 13103

read table with spaces in one column

I am attempting to extract tables from very large text files (computer logs). Dickoa provided very helpful advice to an earlier question on this topic here: extracting table from text file

I modified his suggestion to fit my specific problem and posted my code at the link above.

Unfortunately I have encountered a complication. One column in the table contains spaces. These spaces are generating an error when I try to run the code at the link above. Is there a way to modify that code, or specifically the read.table function to recognize the second column below as a column?

Here is a dummy table in a dummy log:

> collect.models(, adjust = FALSE)
                                                                           model npar      AICc    DeltaAICc       weight  Deviance
5   AA(~region + state + county + city)BB(~region + state + county + city)CC(~1)   17  11111.11    0.0000000 5.621299e-01  22222.22
4                 AA(~region + state + county)BB(~region + state + county)CC(~1)   14  22222.22    0.0000000 5.621299e-01  77777.77
12                                  AA(~region + state)BB(~region + state)CC(~1)   13  33333.33    0.0000000 5.621299e-01  44444.44
12                                                  AA(~region)BB(~region)CC(~1)    6  44444.44    0.0000000 5.621299e-01  55555.55
> 
> # the three lines below count the number of errors in the code above

Here is the R code I am trying to use. This code works if there are no spaces in the second column, the model column:

my.data <- readLines('c:/users/mmiller21/simple R programs/dummy.log')

top    <- '> collect.models\\(, adjust = FALSE)'
bottom <- '> # the three lines below count the number of errors in the code above'

my.data  <- my.data[grep(top, my.data):grep(bottom, my.data)]

x <- read.table(text=my.data, comment.char = ">")

I believe I must use the variables top and bottom to locate the table in the log because the log is huge, variable and complex. Also, not every table contains the same number of models.

Perhaps a regex expression could be used somehow taking advantage of the AA and the CC(~1) present in every model name, but I do not know how to begin. Thank you for any help and sorry for the follow-up question. I should have used a more realistic example table in my initial question. I have a large number of logs. Otherwise I could just extract and edit the tables by hand. The table itself is an odd object which I have only ever been able to export directly with capture.output, which would probably still leave me with the same problem as above.

EDIT:

All spaces seem to come right before and right after a plus sign. Perhaps that information can be used here to fill the spaces or remove them.

Upvotes: 0

Views: 608

Answers (1)

Ricardo Saporta
Ricardo Saporta

Reputation: 55350

try inserting my.data$model <- gsub(" *\\+ *", "+", my.data$model) before read.table

my.data  <- my.data[grep(top, my.data):grep(bottom, my.data)]

my.data$model <- gsub(" *\\+ *", "+", my.data$model)

x <- read.table(text=my.data, comment.char = ">")

Upvotes: 1

Related Questions