Nuthatch92
Nuthatch92

Reputation: 91

Problems using awk to delete a row with a specific value at a certain column

I have a data.anno file, composed of 6677 rows and 33 columns. As an example, in the first image you can see some of the rows of the data.anno file.

2953 of the rows contain "present" in the 10th column. I want to obtain a new file like the original, but without the rows that contain "present" in the 10th column. I've tried with this:

awk '$10!="present"' data.anno >> data_output.anno

but I encountered a problem: the output file I've obtained still contains two rows with "present" in the 3rd column, while the other 2951 rows containing "present" in the 10th column have correctly disappeared. Do you have any idea why this happens? Do you think there are better way to obtain the output file I need?

In the second image you can see the two rows containing "present" that are still present in the output file after using awk. In the third image you can see some of the 2951 rows containing "present" that have correctly disappeared after using awk.

some of the rows of the data.anno file

rows containing "present" that are still present in the output file after using awk

some of the 2951 rows containing "present" that have correctly disappeared after using awk

Upvotes: 2

Views: 639

Answers (1)

Ed Morton
Ed Morton

Reputation: 203522

Your real input file, which has the countries in the 13th column, is tab-separated and has some fields that contain blanks so you need to set FS to tab:

awk -F'\t' '$13 != "Italy" file

otherwise rows that have fields before $13 that contain blanks will be treated as multiple fields and then Italy won't be in the 13th field it'll be in the 14th or later.

Here's what's happening using a more truly representative sample input file that has tab-separated fields (the cat -T is just to make the tabs visible):

$ cat file
ID      DAY     LOCALITY        OTHER
1       the weekend     Italy   stuff
2       mon     England stuff
3       wed     Italy   stuff
4       the weekend     Italy   stuff
5       sun     England stuff
6       thu     Italy   stuff

$ cat -T file
ID^IDAY^ILOCALITY^IOTHER
1^Ithe weekend^IItaly^Istuff
2^Imon^IEngland^Istuff
3^Iwed^IItaly^Istuff
4^Ithe weekend^IItaly^Istuff
5^Isun^IEngland^Istuff
6^Ithu^IItaly^Istuff

$ awk '$3!="Italy"' file
ID      DAY     LOCALITY        OTHER
1       the weekend     Italy   stuff
2       mon     England stuff
4       the weekend     Italy   stuff
5       sun     England stuff

$ awk -F'\t' '$3!="Italy"' file
ID      DAY     LOCALITY        OTHER
2       mon     England stuff
5       sun     England stuff

Upvotes: 1

Related Questions