Sebastian Zeki
Sebastian Zeki

Reputation: 6874

How to remove text before header in r

I have a text file I want to work on. However before I get the the dataset I want to import there is a load of text that I want to strip away. Each line of the text I want to remove starts with a @ symbol but there can be different number of pre-data text lines. How can I strip away this pre-data text. For example:

@HD VN:1.0  SO:unsorted
@SQ SN:chr1 LN:249250621
@SQ SN:chr2 LN:243199373
@SQ SN:chr3 LN:198022430
@SQ SN:chr4 LN:191154276
@SQ SN:chr5 LN:180915260
@SQ SN:chr6 LN:171115067
@SQ SN:chr7 LN:159138663
@SQ SN:chr8 LN:146364022
@SQ SN:chr9 LN:141213431
@SQ SN:chr10    LN:135534747
@SQ SN:chr11    LN:135006516
@SQ SN:chr12    LN:133851895
@SQ SN:chr13    LN:115169878
@SQ SN:chr14    LN:107349540
@SQ SN:chr15    LN:102531392
@PG ID:Bowtie   VN:1.1.0    
Number chr locus
 1      1    10092
 2      12   12313

Upvotes: 0

Views: 358

Answers (2)

MrFlick
MrFlick

Reputation: 206616

If there are no instances of "@" that you want to keep, you can set "@" as the comment character to skip those lines

read.table("data.txt", comment="@", header=T)

Upvotes: 4

r2evans
r2evans

Reputation: 161155

Two quick methods:

Inefficient but All "R"

wholeFile <- readLines('user3632206.csv')
partialFile <- wholeFile[! grepl('^@', wholeFile)]
parsedFile <- read.table(textConnection(partialFile), header = TRUE)
parsedFile
##   Number chr locus
## 1      1   1 10092
## 2      2  12 12313

Preprocess on the Command-Line

On the command line, using grep (native in MacOS and unix, can be found for windows in several packages including cygwin, msys, msys2, etc):

bash$ grep -v '^@' user3632206.csv > user3632206-filtered.csv

And in R:

parsedFile <- read.table('user3632206-filtered.csv', header = TRUE)
parsedFile
##   Number chr locus
## 1      1   1 10092
## 2      2  12 12313

Upvotes: 2

Related Questions