neo2049
neo2049

Reputation: 143

Remove row for 2nd CSV if field in CSV 2 and CSV 1 has a field match

I have 2 CSVs and I that looks something like the following

CSV 1

ID
001
002

CSV 2

ID   Name   Optin   Date
001  Adam   TRUE    2020/01/01
002  Eve    TRUE    2019/11/01
003  John   FALSE   2019/07/22

I would like only the row with ID 003 in the output file, please. Because there are between 3-5 million rows, it'll have to be something in terminal and not excel.

Sorry the tables look so bad.

Thanks in advance.

Upvotes: 0

Views: 36

Answers (1)

tshiono
tshiono

Reputation: 22022

Would you try the following:

awk 'NR==FNR {if (FNR>1) seen[$1]++; next} !seen[$1] {print}' csv1.txt csv2.txt

Lines broken for explanation:

awk '
    NR==FNR {
                        # this block is evaluated while reading csv1.txt only
        if (FNR > 1)    # skip the header line of csv1.txt
            seen[$1]++  # memorize the occurance of the ID
        next            # prevents from continuing to the next block
    }
                        # the lines below is evaluated while reading csv2.txt only
    ! seen[$1] {        # if the ID is not in the array (not included in csv1.txt)
        print           # then print the row
    }
' csv1.txt csv2.txt
  • The condition NR==FNR is met only while reading the first file in the argument list (csv1.txt in this case). This is a common idiom to execute different codes for each file.
  • The statement seen[$1]++ sets a mark on the 1st field of the input line. You can easily know if the ID is included in csv1.txt while processing csv2.txt.

Upvotes: 2

Related Questions