Reputation:
I have a file containing lines like:
"ALMEREWEG ";" 45 ";" ";"ZEEWOLDE ";"3891ZN"
"ALMEREWEG ";" 50 ";" ";"ZEEWOLDE ";"3891ZP"
"ALMEREWEG ";" 51 ";" ";"ZEEWOLDE ";"3891ZN"
"ALMEREWEG ";" 52 ";" ";"ZEEWOLDE ";"3891ZP"
"ALMEREWEG ";" 53 ";" ";"ZEEWOLDE ";"3891ZN"
and I have a second file containing lines like:
3891ZP;50;
3891ZN;53;A
3891ZN;53;B
3891ZN;54;
Now I want to grep the first file based on the pattern of the second file, where:
A) the 1st column of the 2nd file is present in the 5th column of the 1st file; and
B) the 2nd column of the 2nd file is present in the 2nd column of the 1st file.
My question: how to do this?
Update 7 July 2013: I updated file2 format to reflect the third column (number suffices).
Upvotes: 1
Views: 1631
Reputation: 46365
Heavily borrowing from @JS, I offer the following improved solution. The problem with his code is that if you have more than one house number in the same zip code, it will only match the last one. By creating a composite associative array (if that is the name... basically joining the two fields together), you get around this problem:
Create a file postcode.awk
:
BEGIN {
FS=";"
}
# loop around as long as the total number of records read
# is equal to the number of records read in this file
# in other words - loop around the first file only
NR==FNR {
a[$1,$2]=1 # create one array element for each $1/$2 pair
next
}
# loop around all the elements of the second file:
# since we're done processing the first file
{
# copy the original line before modifying it
line=$0
# take out the double quotes
gsub(/\"/,"")
# take out the spaces on either side of the semicolons
gsub(/ *; */,";")
# see if the associative array element exists:
if (a[$5,$2]==1) {
# echo the original line that matched:
print line
}
}
Using test file file1
as follows (I have added a line to show the border case):
"ALMEREWEG ";" 45 ";" ";"ZEEWOLDE ";"3891ZN"
"ALMEREWEG ";" 50 ";" ";"ZEEWOLDE ";"3891ZP"
"ALMEREWEG ";" 52 ";" ";"ZEEWOLDE ";"3891ZP"
"ALMEREWEG ";" 53 ";" ";"ZEEWOLDE ";"3891ZP"
"ALMEREWEG ";" 53 ";" ";"ZEEWOLDE ";"3891ZN"
And key file file2
with (again, added a line):
3891ZP;50
3891ZP;52
3891ZN;53
You will see that JS's code will not match the line with number 50.
But my code does:
awk -f postcode.awk file2 file1
produces
"ALMEREWEG ";" 50 ";" ";"ZEEWOLDE ";"3891ZP"
"ALMEREWEG ";" 52 ";" ";"ZEEWOLDE ";"3891ZP"
"ALMEREWEG ";" 53 ";" ";"ZEEWOLDE ";"3891ZN"
Upvotes: 1
Reputation: 157947
I have splitted file2 into columns using bash's IFS
and read
. Then passed the columns to grep:
# read line by line
while IFS=$'\n' read line ; do
# split into columns
IFS=$';' read -a col <<< "$line"
# the expression can be refined but should work well as is
grep -e ' '${col[1]}' ";".*;.*";"'${col[0]} file1
done < file2
Output:
"ALMEREWEG ";" 50 ";" ";"ZEEWOLDE ";"3891ZP"
"ALMEREWEG ";" 53 ";" ";"ZEEWOLDE ";"3891ZN"
Upvotes: 0
Reputation: 77085
One way with awk
:
awk -F';' '
NR==FNR {
a[$1]=$2
next
}
{
line=$0
gsub(/\"/,"")
gsub(/ *; */,";")
if (a[$5]==$2) {
print line
line=""
}
}' file2 file1
Output:
"ALMEREWEG ";" 50 ";" ";"ZEEWOLDE ";"3891ZP"
"ALMEREWEG ";" 53 ";" ";"ZEEWOLDE ";"3891ZN"
Upvotes: 2
Reputation: 65781
You can use something like sed
to construct patterns for grep
:
$ grep -Ef <(sed -r 's/(.*);(.*)/^[^;]*;[^;]*\2[^;]*;([^;]*;){2}[^;]*\1/' file2) file1
"ALMEREWEG ";" 50 ";" ";"ZEEWOLDE ";"3891ZP"
"ALMEREWEG ";" 53 ";" ";"ZEEWOLDE ";"3891ZN"
Upvotes: 0