Reputation: 951
Suppose you have a recipe text file called recipes.yml
Margherita:
cheese
tomato
Chicken Supreme:
cheese
onions
chicken
mushrooms
Veggie:
cheese
spinach
sweetcorn
peppers
mushrooms
onions
Potato:
cheese
potato
oregano
Now I would like to find any pizza that contains either cheese, onion or rucola. I will put my search terms into another file
$ cat terms.txt
cheese
onion
rucola
$ while read -r line; do echo "searching pizza containing: $line" && SEARCH $line IN recipes.yml; done <terms.txt
searching pizza containing: cheese
found 4
Margherita
Chicken Supreme
Veggie
Potato
searching pizza containing: onion
found 2
Chicken Supreme
Veggie
searching pizza containing: rucola
found 0
Maybe this is too much to do in bash but I would really like to know if it is possible at all. I am stuck right now. I cant seem to find a way to capture the name of the pizza given the ingredient is found. Here are some half-way attempts using grep
, awk
and sed
:
I have only been able to find commands to let me find the number of occurrences of each search term and on what line the match is located in the file. Like this:
$ while read -r "line"; do echo "searching pizza containing: $line" && grep -c "$line" recipes.yml && grep -n "$line" recipes.yml; done <terms.txt
searching pizza containing: cheese
4
2: cheese
6: cheese
12: cheese
20: cheese
searching pizza containing: onion
2
7: onions
17: onions
searching pizza containing: rucola
0
and with awk
and sed
$ while read -r "line"; do echo "searching pizza containing: $line" && awk -v avar="$line" '$0 ~ avar {count++} END {print count}' recipes.yml && sed -n "/$line/p" recipes.yml; done <terms.txt
searching pizza containing: cheese
4
cheese
cheese
cheese
cheese
searching pizza containing: onion
2
onions
onions
searching pizza containing: rucola
Upvotes: 0
Views: 75
Reputation: 84561
First, you would never produce the output shown with "onion"
in your terms.txt
and "onions"
in recipes.yml. (took more than a minute to sort that typo out).
Rule 1, always defer to @EdMorton for the most efficient and validated scripts. That said, a more procedural approach may help what is happening sink in a bit. The awk
script below has four rules. The first guarded by NR == FNR && NF
simple ensures that rule it applied to the first file only and only to a non-blank line. The second guarded by $0 ~ /:$/
ensures the current record ends in ':'
. The third rule applies to all other non-blank lines in the second file. Finally the END
rule just prints the results.
awk '
{ $1 = $1 } # recalculate records to remove whitespace
NR == FNR && NF { # first file and non-blank line
a[++n] = $0 # add term to indexed a[]
next # skip to next record
}
$0 ~ /:$/ { # second file and line ends in ':'
pizza = $0 # set pizza name
next # skip to next record
}
NF { # second file and non-blank line
for (i=1; i<=n; i++) { # loop over a[] array check against terms
if ($0 == a[i]) { # if line matches term
found[$0]++ # increment the found count
c[$0] = c[$0]pizza"\n" # concatenate pizza to c[] capture array
}
}
}
END { # end rule
for (i=1; i<=n; i++) { # loop over terms, output count and pizzas
printf "searching pizza containing: %s\nfound %d\n", a[i], found[a[i]]
printf "%s", c[a[i]]
}
}
' terms.txt recipes.yml
Example Use/Output
With your data in terms.txt
and pizzas.txt
, you can simply select copy and middle-mouse paste into an xterm with the files in the current directory to test, e.g.
$ awk '
> { $1 = $1 } # recalculate records to remove whitespace
> NR == FNR && NF { # first file and non-blank line
> a[++n] = $0 # add term to indexed a[]
> next # skip to next record
> }
> $0 ~ /:$/ { # second file and line ends in ':'
> pizza = $0 # set pizza name
> next # skip to next record
> }
> NF { # second file and non-blank line
> for (i=1; i<=n; i++) { # loop over a[] array check against terms
> if ($0 == a[i]) { # if line matches term
> found[$0]++ # increment the found count
> c[$0] = c[$0]pizza"\n" # concatenate pizza to c[] capture array
> }
> }
> }
> END { # end rule
> for (i=1; i<=n; i++) { # loop over terms, output count and pizzas
> printf "searching pizza containing: %s\nfound %d\n", a[i], found[a[i]]
> printf "%s", c[a[i]]
> }
> }
> ' terms.txt recipes.yml
searching pizza containing: cheese
found 4
Margherita:
Chicken Supreme:
Veggie:
Potato:
searching pizza containing: onions
found 2
Chicken Supreme:
Veggie:
searching pizza containing: rucola
found 0
Let em know if you have further questions, and compare the efficiencies @EdMorton incorporated.
Upvotes: 3
Reputation: 203577
$ cat tst.awk
NR==FNR {
count[$1] = 0
next
}
/^[^[:space:]]/ {
sub(/:.*/,"")
type = $0
next
}
$1 in count || ( sub(/s$/,"",$1) && ($1 in count) ) {
types[$1] = (count[$1]++ ? types[$1] ORS : "") " " type
}
END {
for (term in count) {
print "searching pizza containing:", term
print "found", count[term]
if ( count[term] != 0 ) {
print types[term]
}
}
}
$ awk -f tst.awk terms.txt recipes.yml
searching pizza containing: rucola
found 0
searching pizza containing: cheese
found 4
Margherita
Chicken Supreme
Veggie
Potato
searching pizza containing: onion
found 2
Chicken Supreme
Veggie
Upvotes: 2