Reputation: 1811
Is it possible to remove lines from a file using awk? I'd like to find any lines that have Y in the last column and then remove any lines that match the value in column 2 of said line.
Before:
KEY1,TRACKINGKEY1,TRACKINGNUMBER1-1,PACKAGENUM1-1,N
,TRACKINGKEY1,TRACKINGNUMBER1-2,PACKAGENUM1-2,N
KEY1,TRACKINGKEY1,TRACKINGNUMBER1-1,PACKAGENUM1-1,Y
,TRACKINGKEY1,TRACKINGNUMBER1-2,PACKAGENUM1-2,Y
KEY1,TRACKINGKEY5,TRACKINGNUMBER1-3,PACKAGENUM1-3,N
KEY2,TRACKINGKEY2,TRACKINGNUMBER2-1,PACKAGENUM2-1,N
KEY3,TRACKINGKEY3,TRACKINGNUMBER3-1,PACKAGENUM3-1,N
,TRACKINGKEY3,TRACKINGNUMBER3-2,PACKAGENUM3-2,N
So awk would find that row 3 has Y in the last column, then look at column 2 [TRACKINGKEY1] and remove all lines that have TRACKINGKEY1 in column 2.
Expected result:
KEY1,TRACKINGKEY5,TRACKINGNUMBER1-3,PACKAGENUM1-3,N
KEY2,TRACKINGKEY2,TRACKINGNUMBER2-1,PACKAGENUM2-1,N
KEY3,TRACKINGKEY3,TRACKINGNUMBER3-1,PACKAGENUM3-1,N
,TRACKINGKEY3,TRACKINGNUMBER3-2,PACKAGENUM3-2,N
The reason for this is that our shipping program puts out a file whenever a shipment is processed, as well as when that shipment gets voided [in case of an error]. So what I end up with is the initial package info, then the same info indicating that it was voided, then yet another set of lines with the new shipment info. Unfortunately our ERP software has a fairly simple scripting language in which I can't even make an array so I'm limited to shell tools.
Thanks in advance!
Upvotes: 1
Views: 417
Reputation: 246744
This solution is kind of gross, but kind of fun.
grep ',Y$' file | cut -d, -f2 | sort -u | grep -vwFf - file
grep ',Y$' file
-- find the lines with Y in the last columncut -d, -f2
-- print just the tracking key from those linessort -u
-- give just the unique keysgrep -vwFf - file
--
-f -
)-w
)-F
)-v
) from fileUpvotes: 1
Reputation: 784928
One way is to take 2 pass to same file using awk:
awk -F, 'NR == FNR && $NF=="Y" && !($2 in seen){seen[$2]}
NR != FNR && !($2 in seen)' file file
KEY1,TRACKINGKEY5,TRACKINGNUMBER1-3,PACKAGENUM1-3,N
KEY2,TRACKINGKEY2,TRACKINGNUMBER2-1,PACKAGENUM2-1,N
KEY3,TRACKINGKEY3,TRACKINGNUMBER3-1,PACKAGENUM3-1,N
,TRACKINGKEY3,TRACKINGNUMBER3-2,PACKAGENUM3-2,N
Explanation:
NR == FNR # if processing the file 1st time
&& $NF=="Y" # and last field is Y
&& !($2 in seen) { # we haven't seen field 2 before
seen[$2]} # store field 2 in array seen
}
NR != FNR # when processing the file 2nd time
&& !($2 in seen) # array seen doesn't have field 2
# take default action and print the line
Upvotes: 1