Reputation: 134
I have the following file containing n rows:
>name.1_i4_xyz_n
>name.1_i1_xyz_n
>name.1_i1_xyz_n
>name.1_i1_xyz_m
>name.1_i2_xyz_n
>name.1_i2_xyz_m
>name.1_i7_xyz_m
>name.1_i4_xyz_n
...
I want to delete rows that ends with m
. In the example the output would be:
>name.1_i4_n
>name.1_i4_n
...
Note that I've deleted i2
as it has two records and one of them ends with m
. Same with i1
.
Any help? I want to keep it simple and do it with just one line of code. This is what I have so far:
$ grep "i._.*." < input.txt | sort -k 2 -t "_" | cut -d'_' -f1,2,4
>name.1_i1_m
>name.1_i1_n
>name.1_i1_n
>name.1_i2_m
>name.1_i2_n
>name.1_i4_n
>name.1_i4_n
>name.1_i7_m
...
Upvotes: 0
Views: 124
Reputation: 1517
Another awk proposal.
awk '/_i4/&&!/_m$/' filterm.awk
>name.1_i4_xyz_n
>name.1_i4_xyz_n
Upvotes: 0
Reputation: 37464
to delete rows that ends with m:
$ grep -v m$ file
>name.1_i4_xyz_n
>name.1_i1_xyz_n
>name.1_i1_xyz_n
>name.1_i2_xyz_n
>name.1_i4_xyz_n
Another solution that handles the ids, using awk and 2 runs:
$ awk 'BEGIN { FS="_" } # set delimiter
NR==FNR { # on the first run
if($0~/m$/) # if it ends in an m
d[$2] # make a del array entry of that index
next
}
($2 in d==0)' file file # on the second run don't print if index in del array
>name.1_i4_xyz_n
>name.1_i4_xyz_n
One-liner version:
$ awk 'BEGIN{FS="_"}NR==FNR{if($0~/m$/)d[$2];next}($2 in d==0)' file file
Upvotes: 2
Reputation: 27340
If the i...
part does not appear in any other column, you can use
grep -vFf <(grep -E 'm$' file | cut -d _ -f 2) file
The part inside <()
filters out all i...
that have a row ending with m
. In your example: i1
, i2
, and i7
.
The outer grep
takes a list of literal search strings (inside the <()
) and prints only the lines not containing any of the search strings.
Upvotes: 1
Reputation: 786011
You can use awk
as this:
awk -F_ '{if(/m$/) a[$2]; else rows[++n]=$0}
END{for (i=1; i<=n; i++) {split(rows[i], b, FS); if (!(b[2] in a)) print}}' file
>name.1_i4_xyz_n
>name.1_i4_xyz_n
Upvotes: 1