Using matching entries only, print file A line if column values is between two other columns values in file B

Question

I have a tab delim file1

A 1
A 20
B 17
B 33
C 10
C 20
E 7

and another tab delim file2

I need to print the lines in file1 for which col1 file1 = col1 file2 and value in col2 file1 falls within the ranges in cols 2 and 3 of file2.

The output would look like

A 1
A 20
B 33
C 10
C 20

I'm trying

awk 'FNR==NR{a[$1]=$2;next}; ($1) in a{if($2=(a[$1] >= $2 && a[$1] <=$3) {print}}1'  file1  file2

But it's not working.

jhnc · Accepted Answer

To store multiple ranges, you really want to use arrays of arrays or lists. awk doesn't support them directly but they can be emulated. In this case arrays of arrays seem likely to be more efficient.

awk '
    # store each range from file2
    FNR==NR {
        n = ++q[$1]
        min[$1 FS n] = $2
        max[$1 FS n] = $3
        next
    }

    # process file1
    n = q[$1] { # if no q entry, line cannot be in range
        for (i=1; i<=n; i++)
            if ( min[$1 FS i]<=$2 && $2<=max[$1 FS i]) {
                print
                next
            }
    }
' file2 file1

Each min/max range needs to be stored separately. By maintaining a counter (q[$1]) of occurrences of each different value of col1 ($1), we ensure creation of a distinct new array element [$1 FS n].

Subsequently, when checking the ranges, we know that any particular value of col1 occurs precisely q[$1] times.

Using matching entries only, print file A line if column values is between two other columns values in file B

Answers (2)

Related Questions