user2551551
user2551551

Reputation:

How to skip comments line in data file I want to import, with R

I've many string files (.str), and I want to import them in R (looping on files). The problem is that the first line is neither columns name nor the beginning of the matrix.. It is a comment line. Idem for the last line. between those two lines, stand up the matrix I want to import.. How can I do that ?

Thx

Upvotes: 3

Views: 14187

Answers (3)

posdef
posdef

Reputation: 6532

You can skip arbitrary lines anywhere in the file if you combine the readLines approach Hong Ooi gives together with negative indexing. Here's an example which skips lines 2-5 in a file that has headers but a number lines of annotation/meta info:

lines <- readLines('myfile.txt')
mytable <- read.table(text = lines[-c(2:5)], sep = '\t', header = T)

Upvotes: 0

k.c.
k.c.

Reputation: 116

You can put your comments anywhere in the data files in the same way that you put your comments an R script. For example, if I have a data.txt like this:

# comment 1
str1
str2
# comment 2
str3
# comment 3
str4
str5# comment 4
str6
str7
# comment 5

Then you don't need to do anything to skip the comments:

> x<-read.table("data.txt", header=FALSE)
> x
    V1
1 str1
2 str2
3 str3
4 str4
5 str5
6 str6
7 str7
>

Note that comment 4 is not read. You can change the comment character # by using the comment.char option.

Upvotes: 6

Hong Ooi
Hong Ooi

Reputation: 57686

If you want to skip the first and last lines in a file, you can do it as follows. Use readLines to read the file into a character vector, and then pass it to read.csv.

strs <- readLines("filename.csv")
dat <- read.csv(text=strs,             # read from an R object rather than a file
                skip=1,                # skip the first line
                nrows=length(strs) - 3 # skip the last line
                )

The - 3 is because the number of rows of data is 3 less than the number of lines of text in the file: 1 skipped line at the beginning, 1 line of column headers, and 1 skipped line at the end. Of course, you could also just ignore the nrows argument, and delete the nonsense row from your data frame after the import.

Upvotes: 6

Related Questions