Reputation: 13
I'm trying to open a *.txt file which begins with 'n' empty lines, and I want empty lines to be considered as NA
.
I'm using the read.table()
function and blank.lines.skip = FALSE
argument.
If the number of empty lines is less than 5, the file opens correctly with the proper number of NA lines, but if the file has 5 or more empty lines then I get the following error: empty beginning of file
.
How can I allow my file to have as much empty lines as it wants and get the proper number of NA
lines?
I would greatly appreciate any help and advice. Thanks!
Upvotes: 1
Views: 2977
Reputation: 753
As PavoDive has mentioned, the number 5 is hard-coded into the definition of the base R function read.table
. If you really want to read in the blank rows, you'll need to make a temporary version of the function that uses a different value.
Here's one way to do it. Type fix(read.table)
into the console. In RStudio this opens up another window which shows you the code behind read.table
, and allows you to make changes. Change the 5 in line 34 to a number greater than the number of leading blank rows in your file. For example, I changed it to 6:
When you hit "Save", you'll see a temporary function named read.table
in your current R environment. (If you delete that object, clear your environment, or restart your R session, that temporary modified version of read.table will disappear and you will be back to using the original base R version of read.table
that has 5 in line 34.) Now try reading in your file. It should be able to read your file into a table with the proper number of leading blank rows.
Upvotes: 2
Reputation: 6496
This seems to be the expected behavior of the function:
if you just type read.table
you'll see the code for the function. About the first quarter of the total length you'll find that 5 was (somewhat arbitrarily) chosen as the threshold value for the number of lines to consider the file empty. I copy a fragment of the function:
pbEncoding <- if (encoding %in% c("", "bytes", "UTF-8"))
encoding
else "bytes"
numerals <- match.arg(numerals)
if (skip > 0L)
readLines(file, skip)
nlines <- n0lines <- if (nrows < 0L)
5
else min(5L, (header + nrows))
lines <- .External(C_readtablehead, file, nlines, comment.char,
blank.lines.skip, quote, sep, skipNul)
if (encoding %in% c("UTF-8", "latin1"))
Encoding(lines) <- encoding
nlines <- length(lines)
if (!nlines) {
if (missing(col.names))
stop("no lines available in input")
rlabp <- FALSE
cols <- length(col.names)
and
else if (missing(col.names))
col.names <- paste0("V", 1L:cols)
if (length(col.names) + rlabp < cols)
stop("more columns than column names")
if (fill && length(col.names) > cols)
cols <- length(col.names)
if (!fill && cols > 0L && length(col.names) > cols)
stop("more column names than columns")
if (cols == 0L)
stop("first five rows are empty: giving up")
}
if (check.names)
col.names <- make.names(col.names, unique = TRUE)
What's the important point here? To know that you can get to the code of most functions and understand why they behave the way they do.
Upvotes: 2