Reputation: 1232
I have stumbled upon this problem, which I solved, by hook or by crook. But I need your help in a precise solution. Being a beginner in awk/sed
, I could not solve it with a one liner (which I am sure there is), or an awk
script, though with a lot of pipes I could.
Here is the question:
I have a large .csv file with entries similar to this:
file (space delimited)
$ cat file
a d e r None
c f g r a
c f g r None
d a d e c
d a d e None
g f r t None
g f t r None
k f r e d
r e t y None
s c d er d
g f r t 4
There are no duplicates. But if you see carefully, there are entries in columns 1,2,3,4 that are repeating, and the only change is 5th column, with 'None'
. So I need to remove the rows (records) which are repeating in the 1,2,3,4 fields and that have None
in 5th column.
Here is the code i wrote, which worked, but no one would recommend:
awk '{print $5,$4,$3,$2,$1}' file | sed 's/None/zzz/g' | sort | awk '!array[$2,$3,$4,$5]++' | sed 's/zzz/None/g'
and here is the output that I got, and am expecting.
4 t r f g
a r g f c
c e d a d
d e r f k
d er d c s
None r e d a
None r t f g
None y t e r
Purpose of replacing None
with zzz
was after sorting, the rows will appear in the end, and awk
will remove the second occurrences of the duplicates from remaining columns.
Same is the reason for inverting column sequence and re-inverting it back. Sort.
Please if you could help. Thanks!
Upvotes: 1
Views: 753
Reputation: 195229
I got this solution:
awk '{s=$4" "$3" "$2" "$1; if($5=="None"&& s in a)next;else a[s]=$5" "s}END{for(i in a)print a[i]}' file|sort
it outputs:
kent$ awk '{s=$4" "$3" "$2" "$1; if($5=="None"&& s in a)next;else a[s]=$5" "s}END{for(i in a)print a[i]}' file|sort
4 t r f g
a r g f c
c e d a d
d er d c s
d e r f k
None r e d a
None r t f g
None y t e r
it seems that it is same as your expectation.
Upvotes: 3