chas
chas

Reputation: 1645

filter rows based on text in column

I have a tab delimited text file as shown below:

27  1   hom het:het    het,het,het,het
18  1   hom het:het    hom,het,het,het,het,het,het
29  1   hom het:het    hom,hom,hom,hom,hom,hom,hom,hom,hom,hom,hom,hom,hom,hom
13  1   hom het:het    het,het,het,het,het,het
21  1   hom het:het    hom,het,het,het,het,het,hom,het,hom,het,het,het,hom
25  1   hom het:het    het,hom,het,het,het
29  1   hom het:het    hom,hom,het,hom,het,het,hom,het,het,hom,het,hom,het,hom
18  1   hom het:het    het,het,het
19  1   hom het:het    het,het,hom,het,het,het,het,het,het,hom,het,het,hom,het

I want to exclude the rows which have 'hom' in the 5th column. i.e. the output should look like:

27  1   hom het:het    het,het,het,het
13  1   hom het:het    het,het,het,het,het,het
18  1   hom het:het    het,het,het

Any help using unix command?

Upvotes: 1

Views: 2716

Answers (2)

jkshah
jkshah

Reputation: 11703

Here is an attempt using sed

sed -r '/(\S+\s+){4}[^\s]*\b(hom)\b/d' file

Output:

27  1   hom het:het    het,het,het,het
13  1   hom het:het    het,het,het,het,het,het
18  1   hom het:het    het,het,het

Upvotes: 0

Chris Seymour
Chris Seymour

Reputation: 85785

Awk is perfect for this:

$ awk '$5!~/\<hom\>/' file
27  1   hom het:het    het,het,het,het
13  1   hom het:het    het,het,het,het,het,het
18  1   hom het:het    het,het,het

Explanation:

$5         # is the fifth column
!~         # negated regex match 
/          # start regex string
\<         # matches the empty string at the beginning of a word.
hom        # matches the literal string 'hom'
\>         # matches the empty string at the end of a word.
/          # end regex string

Upvotes: 5

Related Questions