benhid
benhid

Reputation: 134

Delete rows that match pattern by id

I have the following file containing n rows:

>name.1_i4_xyz_n
>name.1_i1_xyz_n
>name.1_i1_xyz_n
>name.1_i1_xyz_m
>name.1_i2_xyz_n
>name.1_i2_xyz_m
>name.1_i7_xyz_m
>name.1_i4_xyz_n
...

I want to delete rows that ends with m. In the example the output would be:

>name.1_i4_n
>name.1_i4_n
...

Note that I've deleted i2 as it has two records and one of them ends with m. Same with i1.

Any help? I want to keep it simple and do it with just one line of code. This is what I have so far:

$ grep "i._.*." < input.txt | sort -k 2 -t "_" | cut -d'_' -f1,2,4
>name.1_i1_m
>name.1_i1_n
>name.1_i1_n
>name.1_i2_m
>name.1_i2_n
>name.1_i4_n
>name.1_i4_n
>name.1_i7_m
...

Upvotes: 0

Views: 124

Answers (4)

Claes Wikner
Claes Wikner

Reputation: 1517

Another awk proposal.

awk '/_i4/&&!/_m$/' filterm.awk

>name.1_i4_xyz_n
>name.1_i4_xyz_n

Upvotes: 0

James Brown
James Brown

Reputation: 37464

to delete rows that ends with m:

$ grep -v m$ file
>name.1_i4_xyz_n
>name.1_i1_xyz_n
>name.1_i1_xyz_n
>name.1_i2_xyz_n
>name.1_i4_xyz_n

Another solution that handles the ids, using awk and 2 runs:

$ awk 'BEGIN { FS="_" }  # set delimiter
NR==FNR {                # on the first run 
    if($0~/m$/)          # if it ends in an m
        d[$2]            # make a del array entry of that index
    next
}
($2 in d==0)' file file  # on the second run don't print if index in del array
>name.1_i4_xyz_n
>name.1_i4_xyz_n

One-liner version:

$ awk 'BEGIN{FS="_"}NR==FNR{if($0~/m$/)d[$2];next}($2 in d==0)' file file

Upvotes: 2

Socowi
Socowi

Reputation: 27340

If the i... part does not appear in any other column, you can use

grep -vFf <(grep -E 'm$' file | cut -d _ -f 2) file

The part inside <() filters out all i... that have a row ending with m. In your example: i1, i2, and i7.

The outer grep takes a list of literal search strings (inside the <()) and prints only the lines not containing any of the search strings.

Upvotes: 1

anubhava
anubhava

Reputation: 786011

You can use awk as this:

awk -F_ '{if(/m$/) a[$2]; else rows[++n]=$0}
END{for (i=1; i<=n; i++) {split(rows[i], b, FS); if (!(b[2] in a)) print}}' file

>name.1_i4_xyz_n
>name.1_i4_xyz_n

Upvotes: 1

Related Questions