I have a file that looks like this: 64fe12c7-b50c-4f63-b292-99f4ed74e5aa, ip, 1.2.3.4, 64fe12c7-b50c-4f63-b292-99f4ed74e5aa, ip, 4.5.6.7, bacd8a9d-807f-4ae9-95d2-f7cc17222cab, ip, 0.0.0.0/0, silly string bacd8a9d-807f-4ae9-95d2-f7cc17222cab, ip, 0.0.0.0/0, crazy town db86d211-0b09-4a8f-b222-a21a54ad2f9c, ip, 8.9.0.1, wild wood db86d211-0b09-4a8f-b222-a21a54ad2f9c, ip, 0.0.0.0/0, wacky tabacky 611f8cf5-f6f2-4f3a-ad24-12245652a7bd, ip, 0.0.0.0/0, cuckoo cachoo I would like to extract a list of just the unique GUIDs where The GUID doesn't have a 0.0.0.0/0 in column 3 column 3 matches 0.0.0.0/0 and there is more than one instance of the GUID and where at least one of the matches is not 0.0.0.0/0 In this case, the desired output would be: 64fe12c7-b50c-4f63-b292-99f4ed74e5aa db86d211-0b09-4a8f-b222-a21a54ad2f9c Trying to think through this, I feel like I should make an array/list of the unique GUIDs, and then kinda grep the matching lines and run the process of the two conditions above, but I just don't know the best way to go about this in a short script or perhaps grep/awk/sort/cut one liner. Appreciate any help! (the original file is a 4 column csv where the 4th column is often null)

Reputation: 55

Extracting matching lines from a CSV

I have a file that looks like this:

64fe12c7-b50c-4f63-b292-99f4ed74e5aa, ip, 1.2.3.4, 
64fe12c7-b50c-4f63-b292-99f4ed74e5aa, ip, 4.5.6.7, 
bacd8a9d-807f-4ae9-95d2-f7cc17222cab, ip, 0.0.0.0/0, silly string
bacd8a9d-807f-4ae9-95d2-f7cc17222cab, ip, 0.0.0.0/0, crazy town
db86d211-0b09-4a8f-b222-a21a54ad2f9c, ip, 8.9.0.1, wild wood
db86d211-0b09-4a8f-b222-a21a54ad2f9c, ip, 0.0.0.0/0, wacky tabacky
611f8cf5-f6f2-4f3a-ad24-12245652a7bd, ip, 0.0.0.0/0, cuckoo cachoo

I would like to extract a list of just the unique GUIDs where

The GUID doesn't have a 0.0.0.0/0 in column 3
column 3 matches 0.0.0.0/0 and there is more than one instance of the GUID and where at least one of the matches is not 0.0.0.0/0

In this case, the desired output would be:

64fe12c7-b50c-4f63-b292-99f4ed74e5aa
db86d211-0b09-4a8f-b222-a21a54ad2f9c

Trying to think through this, I feel like I should make an array/list of the unique GUIDs, and then kinda grep the matching lines and run the process of the two conditions above, but I just don't know the best way to go about this in a short script or perhaps grep/awk/sort/cut one liner. Appreciate any help!

(the original file is a 4 column csv where the 4th column is often null)

Upvotes: 4

Answers (4)

melpomene

Reputation: 85837

Sounds like it could be done with a three-step pipeline:

Filter out rows where column 3 is 0.0.0.0/0: grep -v '^[^,]*,[^,]*, *0\.0\.0\.0/0,'
Select column 1: cut -d, -f1
Only print unique elements: sort -u (alternatively, if all duplicates are adjacent, uniq)

grep -v '^[^,]*,[^,]*, *0\.0\.0\.0/0,' | cut -d, -f1 | sort -u

Upvotes: 0

Vinicius Placco

Reputation: 1731

Just adding another possible solution, similar (but uglier and using more than one command) than the other proposed awk solution. If I understood the question correctly, your condition #2 is already taken into account by #1. In any case, the following awk+sort worked for me:

awk -F, '$3!~/^ 0\.0\.0\.0\/0/ {print $1}' file.csv | sort -u

Using the -u (unique) flag on sort, you'll exclude duplicates. Not completely foolproof, but works in this case.

Hope it helps!

Upvotes: 0

Akshay Hegde

Reputation: 16997

Using awk:

awk -F, '$3 !~/0\.0\.0\.0\/0/ && !seen[$1]++{print $1}' infile

Explanation:

$3 !~/0\.0\.0\.0\/0/ field3 doesn't match regexp and (&&)
!seen[$1]++ field1 not seen before ( whenever awk sees duplicate key ($1), array value will be incremented by 1, we used logical negation to print value only once )
- ! is logical negation operator
- seen is array
- $1 is array key
- ++ increment operator (current context post increment)
print $1 print field1

Test Results:

$ cat infile
64fe12c7-b50c-4f63-b292-99f4ed74e5aa, ip, 1.2.3.4, 
64fe12c7-b50c-4f63-b292-99f4ed74e5aa, ip, 4.5.6.7, 
bacd8a9d-807f-4ae9-95d2-f7cc17222cab, ip, 0.0.0.0/0, silly string
bacd8a9d-807f-4ae9-95d2-f7cc17222cab, ip, 0.0.0.0/0, crazy town
db86d211-0b09-4a8f-b222-a21a54ad2f9c, ip, 8.9.0.1, wild wood
db86d211-0b09-4a8f-b222-a21a54ad2f9c, ip, 0.0.0.0/0, wacky tabacky
611f8cf5-f6f2-4f3a-ad24-12245652a7bd, ip, 0.0.0.0/0, cuckoo cachoo

$ awk -F, '$3 !~/0\.0\.0\.0\/0/ && !seen[$1]++{print $1}' infile
64fe12c7-b50c-4f63-b292-99f4ed74e5aa
db86d211-0b09-4a8f-b222-a21a54ad2f9c

Upvotes: 2

RomanPerekhrest

Reputation: 92884

Awk solution:

awk -F',[[:space:]]*' '$3 !~ /^(0\.){3}0\/0/{ guids[$1] }
                       END{ for(k in guids) print k }' testfile.txt

The output:

db86d211-0b09-4a8f-b222-a21a54ad2f9c
64fe12c7-b50c-4f63-b292-99f4ed74e5aa

Upvotes: 1

Extracting matching lines from a CSV

Answers (4)

Related Questions