user1103309
user1103309

Reputation:

Grep file with two columns as input

I have a file containing lines like:

"ALMEREWEG               ";" 45  ";"      ";"ZEEWOLDE                ";"3891ZN"
"ALMEREWEG               ";" 50  ";"      ";"ZEEWOLDE                ";"3891ZP"
"ALMEREWEG               ";" 51  ";"      ";"ZEEWOLDE                ";"3891ZN"
"ALMEREWEG               ";" 52  ";"      ";"ZEEWOLDE                ";"3891ZP"
"ALMEREWEG               ";" 53  ";"      ";"ZEEWOLDE                ";"3891ZN"

and I have a second file containing lines like:

3891ZP;50;
3891ZN;53;A
3891ZN;53;B
3891ZN;54;

Now I want to grep the first file based on the pattern of the second file, where:

A) the 1st column of the 2nd file is present in the 5th column of the 1st file; and

B) the 2nd column of the 2nd file is present in the 2nd column of the 1st file.

My question: how to do this?

Update 7 July 2013: I updated file2 format to reflect the third column (number suffices).

Upvotes: 1

Views: 1631

Answers (4)

Floris
Floris

Reputation: 46365

Heavily borrowing from @JS, I offer the following improved solution. The problem with his code is that if you have more than one house number in the same zip code, it will only match the last one. By creating a composite associative array (if that is the name... basically joining the two fields together), you get around this problem:

Create a file postcode.awk:

BEGIN {
  FS=";"
}
# loop around as long as the total number of records read
# is equal to the number of records read in this file
# in other words - loop around the first file only
NR==FNR {
  a[$1,$2]=1 # create one array element for each $1/$2 pair
  next
}
# loop around all the elements of the second file:
# since we're done processing the first file
{
  # copy the original line before modifying it
  line=$0
  # take out the double quotes
  gsub(/\"/,"")
  # take out the spaces on either side of the semicolons
  gsub(/ *; */,";")
  # see if the associative array element exists:
  if (a[$5,$2]==1) {
    # echo the original line that matched:
    print line
  }
}

Using test file file1 as follows (I have added a line to show the border case):

"ALMEREWEG               ";" 45  ";"      ";"ZEEWOLDE                ";"3891ZN"
"ALMEREWEG               ";" 50  ";"      ";"ZEEWOLDE                ";"3891ZP"
"ALMEREWEG               ";" 52  ";"      ";"ZEEWOLDE                ";"3891ZP"
"ALMEREWEG               ";" 53  ";"      ";"ZEEWOLDE                ";"3891ZP"
"ALMEREWEG               ";" 53  ";"      ";"ZEEWOLDE                ";"3891ZN"

And key file file2 with (again, added a line):

3891ZP;50
3891ZP;52
3891ZN;53

You will see that JS's code will not match the line with number 50.

But my code does:

awk -f postcode.awk file2 file1

produces

"ALMEREWEG               ";" 50  ";"      ";"ZEEWOLDE                ";"3891ZP"
"ALMEREWEG               ";" 52  ";"      ";"ZEEWOLDE                ";"3891ZP"
"ALMEREWEG               ";" 53  ";"      ";"ZEEWOLDE                ";"3891ZN"

Upvotes: 1

hek2mgl
hek2mgl

Reputation: 157947

I have splitted file2 into columns using bash's IFS and read. Then passed the columns to grep:

# read line by line
while IFS=$'\n' read line ; do
    # split into columns
    IFS=$';' read -a col <<< "$line"
    # the expression can be refined but should work well as is
    grep -e ' '${col[1]}'  ";".*;.*";"'${col[0]} file1
done < file2

Output:

"ALMEREWEG               ";" 50  ";"      ";"ZEEWOLDE                ";"3891ZP"
"ALMEREWEG               ";" 53  ";"      ";"ZEEWOLDE                ";"3891ZN"

Upvotes: 0

jaypal singh
jaypal singh

Reputation: 77085

One way with awk:

awk -F';' '
NR==FNR {
  a[$1]=$2
  next
}
{
  line=$0
  gsub(/\"/,"")
  gsub(/ *; */,";")
  if (a[$5]==$2) {
    print line
    line=""
  }
}' file2 file1

Output:

"ALMEREWEG               ";" 50  ";"      ";"ZEEWOLDE                ";"3891ZP"
"ALMEREWEG               ";" 53  ";"      ";"ZEEWOLDE                ";"3891ZN"

Upvotes: 2

Lev Levitsky
Lev Levitsky

Reputation: 65781

You can use something like sed to construct patterns for grep:

$ grep -Ef <(sed -r 's/(.*);(.*)/^[^;]*;[^;]*\2[^;]*;([^;]*;){2}[^;]*\1/' file2) file1
"ALMEREWEG               ";" 50  ";"      ";"ZEEWOLDE                ";"3891ZP"
"ALMEREWEG               ";" 53  ";"      ";"ZEEWOLDE                ";"3891ZN"

Upvotes: 0

Related Questions