Sootah
Sootah

Reputation: 1811

Delete Specific Lines with AWK [or sed, grep, whatever]

Is it possible to remove lines from a file using awk? I'd like to find any lines that have Y in the last column and then remove any lines that match the value in column 2 of said line.

Before:

KEY1,TRACKINGKEY1,TRACKINGNUMBER1-1,PACKAGENUM1-1,N
    ,TRACKINGKEY1,TRACKINGNUMBER1-2,PACKAGENUM1-2,N
KEY1,TRACKINGKEY1,TRACKINGNUMBER1-1,PACKAGENUM1-1,Y
    ,TRACKINGKEY1,TRACKINGNUMBER1-2,PACKAGENUM1-2,Y
KEY1,TRACKINGKEY5,TRACKINGNUMBER1-3,PACKAGENUM1-3,N
KEY2,TRACKINGKEY2,TRACKINGNUMBER2-1,PACKAGENUM2-1,N
KEY3,TRACKINGKEY3,TRACKINGNUMBER3-1,PACKAGENUM3-1,N
    ,TRACKINGKEY3,TRACKINGNUMBER3-2,PACKAGENUM3-2,N

So awk would find that row 3 has Y in the last column, then look at column 2 [TRACKINGKEY1] and remove all lines that have TRACKINGKEY1 in column 2.

Expected result:

KEY1,TRACKINGKEY5,TRACKINGNUMBER1-3,PACKAGENUM1-3,N
KEY2,TRACKINGKEY2,TRACKINGNUMBER2-1,PACKAGENUM2-1,N
KEY3,TRACKINGKEY3,TRACKINGNUMBER3-1,PACKAGENUM3-1,N
    ,TRACKINGKEY3,TRACKINGNUMBER3-2,PACKAGENUM3-2,N

The reason for this is that our shipping program puts out a file whenever a shipment is processed, as well as when that shipment gets voided [in case of an error]. So what I end up with is the initial package info, then the same info indicating that it was voided, then yet another set of lines with the new shipment info. Unfortunately our ERP software has a fairly simple scripting language in which I can't even make an array so I'm limited to shell tools.

Thanks in advance!

Upvotes: 1

Views: 417

Answers (2)

glenn jackman
glenn jackman

Reputation: 246744

This solution is kind of gross, but kind of fun.

grep ',Y$' file | cut -d, -f2 | sort -u | grep -vwFf - file
  • grep ',Y$' file -- find the lines with Y in the last column
  • cut -d, -f2 -- print just the tracking key from those lines
  • sort -u -- give just the unique keys
  • grep -vwFf - file --
    • read the unique tracking keys from stdin (-f -)
    • only consider them a match if they are whole words (-w)
    • they are fixed strings, not regular expressions (-F)
    • then exclude lines matching these patterns (-v) from file

Upvotes: 1

anubhava
anubhava

Reputation: 784928

One way is to take 2 pass to same file using awk:

awk -F, 'NR == FNR && $NF=="Y" && !($2 in seen){seen[$2]} 
          NR != FNR && !($2 in seen)' file file
KEY1,TRACKINGKEY5,TRACKINGNUMBER1-3,PACKAGENUM1-3,N
KEY2,TRACKINGKEY2,TRACKINGNUMBER2-1,PACKAGENUM2-1,N
KEY3,TRACKINGKEY3,TRACKINGNUMBER3-1,PACKAGENUM3-1,N
    ,TRACKINGKEY3,TRACKINGNUMBER3-2,PACKAGENUM3-2,N

Explanation:

NR == FNR                    # if processing the file 1st time
&& $NF=="Y"                  # and last field is Y
&& !($2 in seen) {           # we haven't seen field 2 before
  seen[$2]}                  # store field 2 in array seen
}
NR != FNR                    # when processing the file 2nd time
&& !($2 in seen)             # array seen doesn't have field 2
                             # take default action and print the line

Upvotes: 1

Related Questions