Reputation: 27

Matching third field in a CSV with pattern file in GNU Linux (AWK/SED/GREP)

I need to print all the lines in a CSV file when 3rd field matches a pattern in a pattern file.

I have tried grep with no luck because it matches with any field not only the third.

grep -f FILE2 FILE1 > OUTPUT

FILE1

dasdas,0,00567,1,lkjiou,85249
sadsad,1,52874,0,lkjiou,00567
asdasd,0,85249,1,lkjiou,52874
dasdas,1,48555,0,gfdkjh,06793
sadsad,0,98745,1,gfdkjh,45346
asdasd,1,56321,0,gfdkjh,47832

FILE2

RIGHT OUTPUT

dasdas,0,00567,1,lkjiou,85249
sadsad,0,98745,1,gfdkjh,45346

WRONG OUTPUT

dasdas,0,00567,1,lkjiou,85249
sadsad,1,52874,0,lkjiou,00567   <---- I don't want this to appear
sadsad,0,98745,1,gfdkjh,45346

I have already searched everywhere and tried different formulas.

EDIT: thanks to Wintermute, I managed to write something like this:

csvquote file1.csv > file1.csv
awk -F '"' 'FNR == NR { patterns[$0] = 1; next } patterns[$6]' file2.csv file1.csv | csvquote -u > result.csv

Csvquote helps parsing CSV files with AWK.

Thank you very much everybody, great community!

Upvotes: 1

Answers (4)

Tiago Lopo

Reputation: 7959

Using grep and sed:

grep -f <( sed -e 's/^\|$/,/g' file2) file1
dasdas,0,00567,1,lkjiou,85249
sadsad,0,98745,1,gfdkjh,45346

Explanation:

We insert a coma at the beginning and at the end of file2, but without changing the file, then we just grep as you were already doing.

Upvotes: 1

NeronLeVelu

Reputation: 10039

sed 's#.*#/^[^,]*,[^,]*,&,/!d#' File2 >/tmp/File2.sed && sed -f /tmp/File2.sed FILE1;rm /tmp/File2.sed

hard in a simple sed like awk can do but should work if awk is not available

same with egrep (usefull on huge file)

sed 's#.*#^[^,]*,[^,]*,&,#' File2 >/tmp/File2.egrep && egrep -f /tmp/File2.egrep FILE1;rm /tmp/File2.egrep

Upvotes: 0

Wintermute

Reputation: 44063

With awk:

awk -F, 'FNR == NR { patterns[$0] = 1; next } patterns[$3]' file2 file1

This works as follows:

FNR == NR {           # when processing the first file (the pattern file)
  patterns[$0] = 1    # remember the patterns
  next                # and do nothing else
}
patterns[$3]          # after that, select lines whose third field
                      # has been seen in the patterns.

Upvotes: 5

user2968573

Reputation: 31

This can be a start

for i in $(cat FILE2);do cat FILE1| cut -d',' -f3|grep $i ;done

Upvotes: 0

Matching third field in a CSV with pattern file in GNU Linux (AWK/SED/GREP)

Answers (4)

Related Questions