Reputation: 137
The file in question is a pileup file from RNAseq. I want to extract information on one chromosome. This has worked for smaller files:
awk '/chrM/ { print }' file1.pileup > file1.chrm.pileup
The error code:
awk: (FILENAME=file1.pileup FNR=1743118775) fatal: grow_iop_buffer: iop->buf: can't allocate 137438953474 bytes of memory (Cannot allocate memory)
Is there an alternative command, or a sub-command to circumvent this?
Thanks for any help.
Edit:
Data looks like this:
chr1 258755 T 1 . F
chr1 258756 C 1 ...... F
chr1 258757 T 1 ... H
chr1 258758 A 1 ........... H
It is 3529769718150 bytes.
I expect to find (basically a bunch of rows between ~70-75% of the way down):
chrM 6432 C 1 ^~. B
chrM 7294 A 1 ........ B
chrM 7296 G 1 ..... B
Edit2:
Output of 'head -n 1 File1 | od -c':
0000000 c h r 1 \t 2 5 8 7 4 9 \t T \t 1 \t
0000020 ^ ~ . \t C \n
0000026
Output of 'head -c xxx File1 | od -c':
head: xxx: invalid number of bytes
0000000
Output of 'head -c 100 File1 | od -c':
0000000 c h r 1 \t 2 5 8 7 4 9 \t T \t 1 \t
0000020 ^ ~ . \t C \n c h r 1 \t 2 5 8 7 5
0000040 0 \t T \t 1 \t . \t C \n c h r 1 \t 2
0000060 5 8 7 5 1 \t T \t 1 \t G \t C \n c h
0000100 r 1 \t 2 5 8 7 5 2 \t T \t 1 \t . \t
0000120 F \n c h r 1 \t 2 5 8 7 5 3 \t C \t
0000140 1 \t . \t
0000144
Upvotes: 2
Views: 733
Reputation: 2794
It sounds like your grep command might not be able to deal with files larger than 2.4 GB because the 32 bit pointer can't access them.
Try running
split --line-bytes=2GB file1.pileup
This will split your file into two pieces that you should be able to process as you'd like.
Upvotes: 2
Reputation: 246942
I wonder if you'll have better success avoiding regular expressions:
awk '$1 == "chrM"' file1.pileup > file1.chrm.pileup
I wonder if your file got "corrupted", and somewhere in the file there's one line that is 137438953474 bytes long. Can you try this:
awk '{print NR, NF, length($0)}' file1.pileup > file1.line_lengths
And see where it craps out?
Upvotes: 0
Reputation: 785316
You can just use grep -F
(fixed text search) here instead of awk
:
grep -wF 'chrM' file1.pileup > file1.chrm.pileup
If you really want to use awk
then faster & shorter command would be avoiding regex:
awk 'index($0, "chrM")' file1.pileup > file1.chrm.pileup
Upvotes: 0