Reputation: 5059

to remove lines in which first column contains a lower case letter

As the title says, I want to remove the lines from my file in which the first column contains a lowercase at any point. Say, I have a file like this:

Ar  MA0007.3    3051    2.62674e-220 OVER   0 OVER  0.749924    0.0797918   0.6897  0.682167    -0.0615 [13,23] 1
NR3C1   MA0113.3    3051    6.79534e-208 OVER   0 OVER  0.759705    0.0819166   0.699595    0.686309    -0.0665 [13,23] 0.269309
NR3C2   MA0727.1    3051    7.09295e-206 OVER   0 OVER  0.754749    0.0821368   0.694756    0.681845    -0.067  [13,23] 0.0756584
FOXA1   MA0148.3    3051    5.53402e-91 OVER    0 OVER  0.860904    0.0640026   0.827295    0.792912    -0.0303 [-3,7]  1
Foxa2   MA0047.2    3051    3.00085e-87 OVER    0 OVER  0.864018    0.065624    0.83031 0.796327    -0.0223 [1,11]  1
FOXP1   MA0481.2    3051    3.11057e-79 OVER    0 OVER  0.843207    0.0698783   0.809315    0.779508    -0.0375 [16,26] 1
FOXL1   MA0033.2    3051    1.60328e-77 OVER    0 OVER  0.925715    0.0677064   0.892118    0.854536    -0.1102 [-2,8]  1
FOXO6   MA0849.1    3051    8.95861e-73 OVER    0 OVER  0.892953    0.0741376   0.858344    0.824513    -0.0954 [13,23] 1
FOXK1   MA0852.2    3051    2.82502e-72 OVER    0 OVER  0.820987    0.0652885   0.790887    0.76394 -0.0325 [2,12]  1

What I would like it to print is :

NR3C1   MA0113.3    3051    6.79534e-208 OVER   0 OVER  0.759705    0.0819166   0.699595    0.686309    -0.0665 [13,23] 0.269309
NR3C2   MA0727.1    3051    7.09295e-206 OVER   0 OVER  0.754749    0.0821368   0.694756    0.681845    -0.067  [13,23] 0.0756584
FOXA1   MA0148.3    3051    5.53402e-91 OVER    0 OVER  0.860904    0.0640026   0.827295    0.792912    -0.0303 [-3,7]  1
FOXP1   MA0481.2    3051    3.11057e-79 OVER    0 OVER  0.843207    0.0698783   0.809315    0.779508    -0.0375 [16,26] 1
FOXL1   MA0033.2    3051    1.60328e-77 OVER    0 OVER  0.925715    0.0677064   0.892118    0.854536    -0.1102 [-2,8]  1
FOXO6   MA0849.1    3051    8.95861e-73 OVER    0 OVER  0.892953    0.0741376   0.858344    0.824513    -0.0954 [13,23] 1
FOXK1   MA0852.2    3051    2.82502e-72 OVER    0 OVER  0.820987    0.0652885   0.790887    0.76394 -0.0325 [2,12]  1

and what I am using is:

awk '!/[a-z]/' < file.txt

This somehow leaves out the following rows:

NR3C1   MA0113.3    3051    6.79534e-208 OVER   0 OVER  0.759705    0.0819166   0.699595    0.686309    -0.0665 [13,23] 0.269309
NR3C2   MA0727.1    3051    7.09295e-206 OVER   0 OVER  0.754749    0.0821368   0.694756    0.681845    -0.067  [13,23] 0.0756584

Could anyone please help me in fixing this.

TIA

Upvotes: 2

Answers (3)

Kent

Reputation: 195109

grep '^[^a-z]\+\s' file

grep is fine.

well, POSIX one: grep '^[^a-z]\+[[:space:]]' file

Upvotes: 1

RavinderSingh13

Reputation: 133538

Following awk may help you.

awk '$1!~/[a-z]/'  Input_file

Explanation: Simply checking here if $1(first field) is NOT equal to /a-z/ means small letter alphabets then mentioning no action here which will do default action, which is printing the current line.

Upvotes: 1

Tom Fenech

Reputation: 74645

You need to match against only the first column using $1 ~ /regex/:

awk '!($1 ~ /[a-z]/)' file

or equivalently:

awk '$1 !~ /[a-z]/' file

Upvotes: 1

to remove lines in which first column contains a lower case letter

Answers (3)

Related Questions