Reputation: 25
Good morning guys!
I have the following lines in file.txt:
big bird|Big birds tend to be big.;;They have wings.
big truck|I love big trucks!;;Ford makes nice big trucks.
red truck|Red is my favorite color.;;Red is also a name.
My two delimiters are the pipe (|) and the double semicolons (;;). My script takes an input, matches it to a line in file.txt BEFORE the pipe (|) and randomly returns a corresponding output after the pipe delimited by (;;).
read -p " " query
response="$(awk -F\| -v r="$query" '$1==r{print $2;exit}' file.txt | sed 's/;;/\n/g' | shuf -n1)"
echo "$response"
Example:
Input: big truck
Output: Ford makes nice big trucks.
But this doesn't work unless the input is an exact match. How can I modify the awk expression to allow it to accept partial matches or in cases where the two words are inverted? (e.g. "truck big" instead of "big truck"
Desired behavior matching "big" and "truck" and returning a random output from the line:
Input: some trucks are big
Output: Ford makes nice big trucks.
Many MANY thanks in advance!
Upvotes: 0
Views: 885
Reputation: 203169
It's not clear if you want a regexp or string match, nor if if you want a partial match on the whole of $1 or on parts of the "words" in $1 or whole words in $1 or something else. The following will do a whole-word string match on the parts of $1 as that seems to me to be what you're most likely to be asking for. You also didn't say how you want duplicates in the input or in the query string handled so the following matches unique words (as opposed to counting occurrences of words, for example):
$ cat tst.sh
#!/usr/bin/env bash
read -p " " query
response="$(
awk -v query="$query" '
BEGIN {
split(query,tmp)
for (i in tmp) {
targets[tmp[i]]
}
for (word in targets) {
targetCnt++
}
FS = "[|]"
}
{
delete present
split($1,tmp," ")
for (i in tmp) {
present[tmp[i]]
}
matchCnt = 0
for (word in targets) {
if (word in present) {
matchCnt++
}
}
if ( targetCnt == matchCnt ) {
sub(/.*;;/,"")
print
}
}
' file |
shuf |
head -1
)"
printf '%s\n' "$response"
.
$ ./tst.sh
truck
Red is also a name.
$ ./tst.sh
truck
Ford makes nice big trucks.
$ ./tst.sh
truck big
Ford makes nice big trucks.
$ ./tst.sh
truck big
Ford makes nice big trucks.
Upvotes: 1
Reputation: 14900
script 'do.awk':
BEGIN{
split(input,s," ");
for (i in s) s2[s[i]]=i;
srand();
}
{
split($1,a," ");
m=0;
for(i in a) {
if (a[i] in s2) then m++;
}
# add if words match (m>0), first matchcount, then $2
if (m>0) {
r[z++]=m";;"$2
}
}
END {
# sort array r, last line will have highest matchcount
n = asort(r);
# print last value
# print r[n];
# get random piece, but exclude r2[1], because it is matchcount.
x=split(r[n],r2,";;")-1;
x = int(rand()*x);
print r2[x+2];
}
awk -v input="big red" -F\| -f do.awk file.txt
will output "Red is also a name" or "Red is my favorite color"
I replaced the random function, for which you used shuf
, with the function rand()
from awk.
(i do hope it's random enough).
EDIT: i noticed you have another post about the RANDOM stuff here: BASH - Regex match line in file.txt with more than one delimiter
EDIT2: The matchcount can be the same for multiple lines. If this is true we shoul add all the texts, and then pick a random one.
I changed the END
section as follows:
END {
# sort array r, last line(s) will have highest matchcount
n = asort(r);
split(r[n],t,";;");
m = t[1];
# add all textx with matchcount=m
o="";
for(i=length(r); i>=1; i--) {
split(r[i],t,";;")
if(t[1]==m) {
for (j=2; j<=length(t); j++) {
o=o";;"t[j];
}
} else {
break;
}
}
# get random piece, from o.
x=split(o,r2,";;")-1;
m=r2[1];
x = int(rand()*x);
print r2[x+2];
}
EDIT:
Finally i should have mentioned that asort()
is only available in gawk.
Upvotes: 0