Partial string match using AWK

Question

Good morning guys!

I have the following lines in file.txt:

big bird|Big birds tend to be big.;;They have wings.
big truck|I love big trucks!;;Ford makes nice big trucks. 
red truck|Red is my favorite color.;;Red is also a name.

My two delimiters are the pipe (|) and the double semicolons (;;). My script takes an input, matches it to a line in file.txt BEFORE the pipe (|) and randomly returns a corresponding output after the pipe delimited by (;;).

read -p " " query
response="$(awk -F\| -v r="$query" '$1==r{print $2;exit}' file.txt | sed 's/;;/
/g' | shuf -n1)"
echo "$response"

Example:

Input: big truck
Output: Ford makes nice big trucks.

But this doesn't work unless the input is an exact match. How can I modify the awk expression to allow it to accept partial matches or in cases where the two words are inverted? (e.g. "truck big" instead of "big truck"

Desired behavior matching "big" and "truck" and returning a random output from the line:

Input: some trucks are big
Output: Ford makes nice big trucks.

Many MANY thanks in advance!

Luuk · Accepted Answer

script 'do.awk':

BEGIN{
        split(input,s," ");
        for (i in s) s2[s[i]]=i;
        srand();
}
{
        split($1,a," ");
        m=0;
        for(i in a) {
                if (a[i] in s2) then m++;
        }
        # add if words match (m>0), first matchcount, then $2
        if (m>0) {
                r[z++]=m";;"$2
                }
}
END {
        # sort array r, last line will have highest matchcount
        n = asort(r);
        # print last value
        # print r[n];
        # get random piece, but exclude r2[1], because it is matchcount.
        x=split(r[n],r2,";;")-1;
        x = int(rand()*x);
        print  r2[x+2];
}

awk -v input="big red" -F\| -f do.awk file.txt will output "Red is also a name" or "Red is my favorite color"

I replaced the random function, for which you used shuf, with the function rand() from awk. (i do hope it's random enough).

EDIT: i noticed you have another post about the RANDOM stuff here: BASH - Regex match line in file.txt with more than one delimiter

EDIT2: The matchcount can be the same for multiple lines. If this is true we shoul add all the texts, and then pick a random one.

I changed the END section as follows:

END {
        # sort array r, last line(s) will have highest matchcount
        n = asort(r);
        split(r[n],t,";;");
        m = t[1];
        # add all textx with matchcount=m
        o="";
        for(i=length(r); i>=1; i--) {
                split(r[i],t,";;")
                if(t[1]==m) {
                        for (j=2; j<=length(t); j++) {
                                o=o";;"t[j];
                        }
                } else {
                        break;
                }
        }

        # get random piece, from o.
        x=split(o,r2,";;")-1;
        m=r2[1];
        x = int(rand()*x);
        print  r2[x+2];
}

EDIT: Finally i should have mentioned that asort() is only available in gawk.

Partial string match using AWK

Answers (2)

Related Questions