Reputation: 344

Filter out rows from column A based on values in column B

I have a CSV file that contains two columns. First column is a list of all subscribers and second column is a list of subscribers who need to be excluded from a mailing:

all,exclusions
[email protected],[email protected]
[email protected],[email protected]
[email protected]
[email protected]
[email protected]

I need to end up with an output of all subscribers from first column who are not listed in the second column. The desired output is something like this:

[email protected]
[email protected]
[email protected]

So far all I have is this:

awk -F, '(NR>1) {if($1!=$2) {print}}' subs.csv

This of course will only filter out the rows when there are matching values in both columns on the same row. Thanks for any help.

Upvotes: 3

Answers (3)

Carlos Pascual

Reputation: 1126

With two arrays. First field $1 is the list of all subscribers and this is used as an index of an array called a. Second field $2 is the list of subscribers who need to be excluded and it is used as index of array b. We get subscribers from first column who are not listed in the second column this way in the END part: for (i in a) if (!(i in b)) print i using the two arrays:

awk -v FS=',' '
        NR > 1 {a[$1];b[$2]}
        END{for (i in a) if (!(i in b)) print i}
' file
[email protected]
[email protected]
[email protected]

Or using the continue statement which causes the next iteration to begin.

awk -v FS=',' '
        NR > 1 {a[$1];b[$2]}
        END{for (i in a) if (i in b) continue;else print i}
' file
[email protected]
[email protected]
[email protected]

Upvotes: 1

Cyrus

Reputation: 88819

With an array. I assume that there are no duplicates in the first column.

awk -F ',' 'NR>1{
              array[$1]++; array[$2]--
            }
            END{
              for(i in array){ if(array[i]==1){ print i } }
            }' file

As one line:

awk -F ',' 'NR>1{ array[$1]++; array[$2]-- } END{for(i in array){ if(array[i]==1){ print i } } }' file

Output:

[email protected]
[email protected]
[email protected]

Upvotes: 2

Andre Wildberg

Reputation: 19191

For completeness, remove excluded entries, including repeated values.

Data

$ cat file
all,exclusions
[email protected],[email protected]
[email protected],[email protected]
[email protected]
[email protected]
[email protected],[email protected]
[email protected],[email protected]
[email protected]
[email protected]
[email protected]
[email protected]

$ awk -F ',' 'NR>1 && NF==1{ all[$1]++ }
  NR>1 && NF==2{ all[$1]++; excl[$2]++ }
  END{ for(i in excl){ all[i]=0 };
    for(i in all){ if(all[i]>=1){ print i } } }' file

[email protected]
[email protected]
[email protected]

Upvotes: 2

Filter out rows from column A based on values in column B

Answers (3)

Related Questions