Reputation: 109
I hope you're all having a jolly nice day. I have a question which I could do with some help on. I have two CSV
files, both which contain 250k
records. One looks like this:
ProductCode,StockLevel,Active
4101385360060,0,1
6019002320030,0,0
3328200078030,0,0
2411000347030,93,1
And the other like this:
ProductURL,Product Code,Product Name,More Fields
"http://www.domain.com",4101385360060,Timberland 6 In Buck Boot WHEAT NUBUCK
What I want to do with the first file is remove all records which don't need these conditions:
<All ok>,<!=0>,<=1>
Then with the second file remove all rows which are NOT
in the first file.
Any help - hugely appreciated. I suspect that awk
is my friend.
Cheers,
Matt.
Upvotes: 1
Views: 175
Reputation: 203209
It sounds like this is what you want:
$ cat tst.awk
BEGIN { FS="," }
FNR==1 { out=FILENAME"_out"; valid[$1] }
NR==FNR && ($2!=0) && ($3==1) { valid[$1] }
$1 in valid { print > out }
$ awk -f tst.awk file1 file2
$ cat file1_out
ProductCode,StockLevel,Active
2411000347030,93,1
$ cat file2_out
Product Code,Product Name
If not, edit your question to clarify your requirements and provide more truly representative sample input and the associated expected output.
Upvotes: 2
Reputation: 14945
OP commented that the id is not the first part of the second file so one way could be:
awk -F\, 'NR>1&&$2&&$3==1' file1 |tee -a file1_filtered |while IFS=, read -r id dummy; do
echo ",$id,"
done > pattern_file
grep -f pattern_file file2 >file2_filtered
Upvotes: 1
Reputation: 784948
You can use this awk command:
awk -F, 'FNR==NR && $2!=0 && $3==1{a[$1]; next} $1 in a' file1.csv file2.csv
However I suggest you to provide more sample data with expected output in your question.
Upvotes: 2