Reputation: 4470
I have a text file which would always have a header (without "#" in beginning of the line). There may be some lines before header, all having "#" in beginning of the each line. There can be some lines within the data, which also start with "#"
I need to identify these "#" symbol lines before header and skip them before reading the file.
data
#version 2.4
##
## Oncotator v1.0.0.0rc16| Gaf 3.0 | UniProt_AAxform 2011_09
## OxoG Filter v3
Hugo_Symbol Entrez_Gene_Id
BAGE1 0
BAGE1 0
#errt 23
RTRRT 23
I want to skip 4 lines and read the file with header I tried
dum.data<-readLines(filename)
top<-"^#"
if(grepl((top,dum.data[1])){
ret <- grep(top,dum.data)
}
But in this case, I need to identify only "#" lines(if any) before header. not in between of the data.
Upvotes: 1
Views: 176
Reputation: 42649
Check for leading comment lines by using rle
and diff
. Remove only the first group, and only if it precedes any non-comment lines:
r <- rle(diff(grep('^#', dum.data)))
dum.data <- if (length(r$values) && r$values[1] == 1) tail(dum.data, -(r$lengths[1]+1)) else dum.data
dum.data
## [1] "Hugo_Symbol Entrez_Gene_Id"
## [2] "BAGE1 0"
## [3] "BAGE1 0"
## [4] "#errt 23"
## [5] "RTRRT 23"
Then use this to initialize a textConnection
and read the table.
Upvotes: 2