Reputation: 6874
I have a text file I want to work on. However before I get the the dataset I want to import there is a load of text that I want to strip away. Each line of the text I want to remove starts with a @ symbol but there can be different number of pre-data text lines. How can I strip away this pre-data text. For example:
@HD VN:1.0 SO:unsorted
@SQ SN:chr1 LN:249250621
@SQ SN:chr2 LN:243199373
@SQ SN:chr3 LN:198022430
@SQ SN:chr4 LN:191154276
@SQ SN:chr5 LN:180915260
@SQ SN:chr6 LN:171115067
@SQ SN:chr7 LN:159138663
@SQ SN:chr8 LN:146364022
@SQ SN:chr9 LN:141213431
@SQ SN:chr10 LN:135534747
@SQ SN:chr11 LN:135006516
@SQ SN:chr12 LN:133851895
@SQ SN:chr13 LN:115169878
@SQ SN:chr14 LN:107349540
@SQ SN:chr15 LN:102531392
@PG ID:Bowtie VN:1.1.0
Number chr locus
1 1 10092
2 12 12313
Upvotes: 0
Views: 358
Reputation: 206616
If there are no instances of "@" that you want to keep, you can set "@" as the comment character to skip those lines
read.table("data.txt", comment="@", header=T)
Upvotes: 4
Reputation: 161155
Two quick methods:
wholeFile <- readLines('user3632206.csv')
partialFile <- wholeFile[! grepl('^@', wholeFile)]
parsedFile <- read.table(textConnection(partialFile), header = TRUE)
parsedFile
## Number chr locus
## 1 1 1 10092
## 2 2 12 12313
On the command line, using grep
(native in MacOS and unix, can be found for windows in several packages including cygwin, msys, msys2, etc):
bash$ grep -v '^@' user3632206.csv > user3632206-filtered.csv
And in R:
parsedFile <- read.table('user3632206-filtered.csv', header = TRUE)
parsedFile
## Number chr locus
## 1 1 1 10092
## 2 2 12 12313
Upvotes: 2