Elise B
Elise B

Reputation: 13

Run into error “empty beginning of file” when read.table in R with 5 or more first empty lines

I'm trying to open a *.txt file which begins with 'n' empty lines, and I want empty lines to be considered as NA.

I'm using the read.table() function and blank.lines.skip = FALSE argument. If the number of empty lines is less than 5, the file opens correctly with the proper number of NA lines, but if the file has 5 or more empty lines then I get the following error: empty beginning of file.

How can I allow my file to have as much empty lines as it wants and get the proper number of NA lines?

I would greatly appreciate any help and advice. Thanks!

Upvotes: 1

Views: 2977

Answers (2)

Boops Boops
Boops Boops

Reputation: 753

As PavoDive has mentioned, the number 5 is hard-coded into the definition of the base R function read.table. If you really want to read in the blank rows, you'll need to make a temporary version of the function that uses a different value.

Here's one way to do it. Type fix(read.table) into the console. In RStudio this opens up another window which shows you the code behind read.table, and allows you to make changes. Change the 5 in line 34 to a number greater than the number of leading blank rows in your file. For example, I changed it to 6:

screenshot of Edit window

When you hit "Save", you'll see a temporary function named read.table in your current R environment. (If you delete that object, clear your environment, or restart your R session, that temporary modified version of read.table will disappear and you will be back to using the original base R version of read.table that has 5 in line 34.) Now try reading in your file. It should be able to read your file into a table with the proper number of leading blank rows.

Upvotes: 2

PavoDive
PavoDive

Reputation: 6496

This seems to be the expected behavior of the function:

if you just type read.table you'll see the code for the function. About the first quarter of the total length you'll find that 5 was (somewhat arbitrarily) chosen as the threshold value for the number of lines to consider the file empty. I copy a fragment of the function:

pbEncoding <- if (encoding %in% c("", "bytes", "UTF-8")) 
        encoding
    else "bytes"
    numerals <- match.arg(numerals)
    if (skip > 0L) 
        readLines(file, skip)
    nlines <- n0lines <- if (nrows < 0L) 
        5
    else min(5L, (header + nrows))
    lines <- .External(C_readtablehead, file, nlines, comment.char, 
        blank.lines.skip, quote, sep, skipNul)
    if (encoding %in% c("UTF-8", "latin1")) 
        Encoding(lines) <- encoding
    nlines <- length(lines)
    if (!nlines) {
        if (missing(col.names)) 
            stop("no lines available in input")
        rlabp <- FALSE
        cols <- length(col.names)

and

else if (missing(col.names)) 
            col.names <- paste0("V", 1L:cols)
        if (length(col.names) + rlabp < cols) 
            stop("more columns than column names")
        if (fill && length(col.names) > cols) 
            cols <- length(col.names)
        if (!fill && cols > 0L && length(col.names) > cols) 
            stop("more column names than columns")
        if (cols == 0L) 
            stop("first five rows are empty: giving up")
    }
    if (check.names) 
        col.names <- make.names(col.names, unique = TRUE)

What's the important point here? To know that you can get to the code of most functions and understand why they behave the way they do.

Upvotes: 2

Related Questions