Reputation: 143
I have 2 CSVs and I that looks something like the following
ID
001
002
ID Name Optin Date
001 Adam TRUE 2020/01/01
002 Eve TRUE 2019/11/01
003 John FALSE 2019/07/22
I would like only the row with ID 003 in the output file, please. Because there are between 3-5 million rows, it'll have to be something in terminal and not excel.
Sorry the tables look so bad.
Thanks in advance.
Upvotes: 0
Views: 36
Reputation: 22022
Would you try the following:
awk 'NR==FNR {if (FNR>1) seen[$1]++; next} !seen[$1] {print}' csv1.txt csv2.txt
Lines broken for explanation:
awk '
NR==FNR {
# this block is evaluated while reading csv1.txt only
if (FNR > 1) # skip the header line of csv1.txt
seen[$1]++ # memorize the occurance of the ID
next # prevents from continuing to the next block
}
# the lines below is evaluated while reading csv2.txt only
! seen[$1] { # if the ID is not in the array (not included in csv1.txt)
print # then print the row
}
' csv1.txt csv2.txt
NR==FNR
is met only while reading the first file
in the argument list (csv1.txt
in this case).
This is a common idiom to execute different codes for each file.seen[$1]++
sets a mark on the 1st field of the input line.
You can easily know if the ID is included in csv1.txt
while processing
csv2.txt
.Upvotes: 2