Reputation: 7041

delete lines containing multiple strings in shell

I have a file with ~15k rows/records such like this:

$ head -50 skato.tsv 
chr gene    SKATO.pval  SKATO.pmin  rho cmaf    nsnps.gene
chr1    NA  NA  NA  NA  NA  NA
chr1    SAMD11  0.7068  0.5451  0   0.01214 5
chr1    NOC2L   0.09887 0.05592 0   0.1926  8
chr1    KLHL17  0.1262  0.09206 0   0.003241    3
chr1    PLEKHN1 0.01034 0.2067  0   0.5905  11
chr1    HES4    0.02433 0.02433 0   0.002427    1
chr1    ISG15   0.1942  0.1942  1   0.3803  2
chr1    AGRN    0.8922  0.7151  1   0.115   18
chr1    C1orf159    0.5763  0.361   0   0.03485 2
chr1    TTLL10  0.2172  0.1272  0   0.1869  11
chr1    TNFRSF18    0.4014  0.2909  0   0.01379 6
chr1    TNFRSF4 0.1456  0.1179  1   0.001619    2
chr1    SDF4    0.1963  0.1963  0   0.0008104   1

what I what is to remove all the lines like that of the second row:

chrx    NA  NA  NA  NA  NA  NA

It may be much easy for many of you here, but I am kind of frustrated by that. Could somebody help me out. Thanks.

Upvotes: 0

Answers (4)

potong

Reputation: 58430

This might work for you (GNU sed):

sed -r '/(\s+NA){6}/d' file

Delete any line with 6 or more of the required string

sed '/\(\s\s*NA\)\{6\}/d' file

Should also work for most seds.

Upvotes: 1

clt60

Reputation: 63932

I would use:

grep -vP '^chr\d+(\s+NA){6}\s*$' <infile >outfile

Upvotes: 0

Avinash Raj

Reputation: 174716

You could try the below sed command.

sed '/^chr[0-9]\+\([[:blank:]]\+NA\)\+$/d' file

This will delete all the lines which has one or more NA's.

Upvotes: 1

SMA

Reputation: 37033

Try something like:

egrep -v "chr[0-9]+\s+NA\s+NA" myfile.txt

Or if you want to stick with sed, then

sed -r -i.bak "/chr[0-9]+\s+NA\s+NA/d" myfile.txt ##add multiple NA's that you wish to check for

Which will create back file before actually deleting the line

Upvotes: 0

delete lines containing multiple strings in shell

Answers (4)

Related Questions