Reputation: 55
I have a file that looks like this:
64fe12c7-b50c-4f63-b292-99f4ed74e5aa, ip, 1.2.3.4,
64fe12c7-b50c-4f63-b292-99f4ed74e5aa, ip, 4.5.6.7,
bacd8a9d-807f-4ae9-95d2-f7cc17222cab, ip, 0.0.0.0/0, silly string
bacd8a9d-807f-4ae9-95d2-f7cc17222cab, ip, 0.0.0.0/0, crazy town
db86d211-0b09-4a8f-b222-a21a54ad2f9c, ip, 8.9.0.1, wild wood
db86d211-0b09-4a8f-b222-a21a54ad2f9c, ip, 0.0.0.0/0, wacky tabacky
611f8cf5-f6f2-4f3a-ad24-12245652a7bd, ip, 0.0.0.0/0, cuckoo cachoo
I would like to extract a list of just the unique GUIDs where
In this case, the desired output would be:
64fe12c7-b50c-4f63-b292-99f4ed74e5aa
db86d211-0b09-4a8f-b222-a21a54ad2f9c
Trying to think through this, I feel like I should make an array/list of the unique GUIDs, and then kinda grep the matching lines and run the process of the two conditions above, but I just don't know the best way to go about this in a short script or perhaps grep/awk/sort/cut one liner. Appreciate any help!
(the original file is a 4 column csv where the 4th column is often null)
Upvotes: 4
Views: 135
Reputation: 85837
Sounds like it could be done with a three-step pipeline:
0.0.0.0/0
: grep -v '^[^,]*,[^,]*, *0\.0\.0\.0/0,'
cut -d, -f1
sort -u
(alternatively, if all duplicates are adjacent, uniq
)grep -v '^[^,]*,[^,]*, *0\.0\.0\.0/0,' | cut -d, -f1 | sort -u
Upvotes: 0
Reputation: 1731
Just adding another possible solution, similar (but uglier and using more than one command) than the other proposed awk
solution. If I understood the question correctly, your condition #2 is already taken into account by #1. In any case, the following awk+sort
worked for me:
awk -F, '$3!~/^ 0\.0\.0\.0\/0/ {print $1}' file.csv | sort -u
Using the -u
(unique) flag on sort
, you'll exclude duplicates. Not completely foolproof, but works in this case.
Hope it helps!
Upvotes: 0
Reputation: 16997
Using awk
:
awk -F, '$3 !~/0\.0\.0\.0\/0/ && !seen[$1]++{print $1}' infile
Explanation:
$3 !~/0\.0\.0\.0\/0/
field3 doesn't match regexp and (&&
)!seen[$1]++
field1 not seen before ( whenever awk sees duplicate key ($1
), array value will be incremented by 1, we used logical negation to print value only once )
!
is logical negation operatorseen
is array $1
is array key ++
increment operator (current context post increment)print $1
print field1Test Results:
$ cat infile
64fe12c7-b50c-4f63-b292-99f4ed74e5aa, ip, 1.2.3.4,
64fe12c7-b50c-4f63-b292-99f4ed74e5aa, ip, 4.5.6.7,
bacd8a9d-807f-4ae9-95d2-f7cc17222cab, ip, 0.0.0.0/0, silly string
bacd8a9d-807f-4ae9-95d2-f7cc17222cab, ip, 0.0.0.0/0, crazy town
db86d211-0b09-4a8f-b222-a21a54ad2f9c, ip, 8.9.0.1, wild wood
db86d211-0b09-4a8f-b222-a21a54ad2f9c, ip, 0.0.0.0/0, wacky tabacky
611f8cf5-f6f2-4f3a-ad24-12245652a7bd, ip, 0.0.0.0/0, cuckoo cachoo
$ awk -F, '$3 !~/0\.0\.0\.0\/0/ && !seen[$1]++{print $1}' infile
64fe12c7-b50c-4f63-b292-99f4ed74e5aa
db86d211-0b09-4a8f-b222-a21a54ad2f9c
Upvotes: 2
Reputation: 92884
Awk
solution:
awk -F',[[:space:]]*' '$3 !~ /^(0\.){3}0\/0/{ guids[$1] }
END{ for(k in guids) print k }' testfile.txt
The output:
db86d211-0b09-4a8f-b222-a21a54ad2f9c
64fe12c7-b50c-4f63-b292-99f4ed74e5aa
Upvotes: 1