Abhishek Rath
Abhishek Rath

Reputation: 7

Filter records from one file based on a values present in another file using Unix

I have an Input csv file Input feed

PK,Col1,Col2,Col3,Col4,Col5    
A,1,2,3,4,5
B,1,A,B,C,D
C,1,2,3,4
D,2,1,2,3
E,5,1,1,1
F,8,1,1,1

There is an output error csv file which is generated from this input file which has the Primary Key

Error File

    Pk,Error_Reason   
    D,Failure
    E, Failure
    F, Failure

I want to extract all the records from the input file and save it into a new file for which there is a Primary key entry in Error file.

Basically my new file should look like this:

New Input feed

PK,Col1,Col2,Col3,Col4,Col5    
D,2,1,2,3
E,5,1,1,1
F,8,1,1,1

I am a beginner in Unix and I have tried Awk command.

The Approach I have tried is, get all the primary key values into a file.

akw -F"," '{print $2}' error.csv >> error_pk.csv

Now I need to filter out the records from the input.csv for all the primary key values present in error.pk

Upvotes: 0

Views: 811

Answers (3)

Walter A
Walter A

Reputation: 20002

You can use grep -f with a file with search items. Cut off at the ,.

grep -Ef <(sed -r 's/([^,]*).*/^\1,/' file2) file1

When you want a header in the output,

Upvotes: 2

James Brown
James Brown

Reputation: 37404

Using awk. As there is leading space in the error file, it needs to be trimmend off first, I'm using sub for that. Then, since the titles of the first column are not identical, (PK vs Pk) that needs to be handled separately with FNR==1:

$ awk -F, '                      # set separator
NR==FNR {                        # process the first file
    sub(/^ */,"")                # trim leading space
    a[$1]                        # hash the first column
    next  
}
FNR==1 || ($1 in a)' error input # output tthe header record and if match hashed

Output:

PK,Col1,Col2,Col3,Col4,Col5    
D,2,1,2,3
E,5,1,1,1
F,8,1,1,1

Upvotes: 1

KamilCuk
KamilCuk

Reputation: 141040

You can use join.

First remove everything afte the comma from second file
Join on the first field from both files

cat <<EOF >file1
PK,Col1,Col2,Col3,Col4,Col5    
A,1,2,3,4,5
B,1,A,B,C,D
C,1,2,3,4
D,2,1,2,3
E,5,1,1,1
F,8,1,1,1
EOF

cat <<EOF >file2
PK,Error_Reason   
D,Failure
E,Failure
F,Failure
EOF

join -t, -11 -21 <(sort -k1 file1) <(cut -d, -f1 file2 | sort -k1)

If you need the file to be sorted according to file1, you can number the lines in first file, join the files, re-sort using the line numbers and then remove the numbers from the output:

join -t, -12 -21 <(nl -w1 -s, file1 | sort -t, -k2) <(cut -d, -f1 file2 | sort -k1) |
sort -t, -k2 | cut -d, -f1,3-

Upvotes: 1

Related Questions