Paul H
Paul H

Reputation: 951

How do I search sequentially for multiple strings in same file?

Problem

Suppose you have a recipe text file called recipes.yml

Margherita:
  cheese
  tomato

Chicken Supreme:
  cheese
  onions
  chicken
  mushrooms

Veggie:
  cheese
  spinach
  sweetcorn
  peppers
  mushrooms
  onions

Potato:
  cheese
  potato
  oregano

Now I would like to find any pizza that contains either cheese, onion or rucola. I will put my search terms into another file

$ cat terms.txt
cheese
onion
rucola

Desired output

$ while read -r line; do echo "searching pizza containing: $line" && SEARCH $line IN recipes.yml; done <terms.txt
searching pizza containing: cheese
found 4
  Margherita
  Chicken Supreme
  Veggie
  Potato
searching pizza containing: onion
found 2
  Chicken Supreme
  Veggie
searching pizza containing: rucola
found 0

Maybe this is too much to do in bash but I would really like to know if it is possible at all. I am stuck right now. I cant seem to find a way to capture the name of the pizza given the ingredient is found. Here are some half-way attempts using grep, awk and sed:

Attempts

I have only been able to find commands to let me find the number of occurrences of each search term and on what line the match is located in the file. Like this:

$ while read -r "line"; do echo "searching pizza containing: $line" && grep -c "$line" recipes.yml && grep -n "$line" recipes.yml; done <terms.txt
searching pizza containing: cheese
4
2:  cheese
6:  cheese
12:  cheese
20:  cheese
searching pizza containing: onion
2
7:  onions
17:  onions
searching pizza containing: rucola
0

and with awk and sed

$ while read -r "line"; do echo "searching pizza containing: $line" && awk -v avar="$line" '$0 ~ avar {count++} END {print count}' recipes.yml && sed -n "/$line/p" recipes.yml; done <terms.txt
searching pizza containing: cheese
4
  cheese
  cheese
  cheese
  cheese
searching pizza containing: onion
2
  onions
  onions
searching pizza containing: rucola

Upvotes: 0

Views: 75

Answers (2)

David C. Rankin
David C. Rankin

Reputation: 84561

First, you would never produce the output shown with "onion" in your terms.txt and "onions" in recipes.yml. (took more than a minute to sort that typo out).

Rule 1, always defer to @EdMorton for the most efficient and validated scripts. That said, a more procedural approach may help what is happening sink in a bit. The awk script below has four rules. The first guarded by NR == FNR && NF simple ensures that rule it applied to the first file only and only to a non-blank line. The second guarded by $0 ~ /:$/ ensures the current record ends in ':'. The third rule applies to all other non-blank lines in the second file. Finally the END rule just prints the results.

awk '
    { $1 = $1 }                         # recalculate records to remove whitespace
    NR == FNR && NF {                   # first file and non-blank line
        a[++n] = $0                     # add term to indexed a[]
        next                            # skip to next record
    }
    $0 ~ /:$/ {                         # second file and line ends in ':'
        pizza = $0                      # set pizza name
        next                            # skip to next record
    }
    NF {                                # second file and non-blank line
        for (i=1; i<=n; i++) {          # loop over a[] array check against terms
            if ($0 == a[i]) {           # if line matches term
                found[$0]++             # increment the found count 
                c[$0] = c[$0]pizza"\n"  # concatenate pizza to c[] capture array
            }
        }
    }
    END {                               # end rule
        for (i=1; i<=n; i++) {          # loop over terms, output count and pizzas 
            printf "searching pizza containing: %s\nfound %d\n", a[i], found[a[i]]
            printf "%s", c[a[i]]
        }
    }
' terms.txt recipes.yml

Example Use/Output

With your data in terms.txt and pizzas.txt, you can simply select copy and middle-mouse paste into an xterm with the files in the current directory to test, e.g.

$ awk '
>     { $1 = $1 }                         # recalculate records to remove whitespace
>     NR == FNR && NF {                   # first file and non-blank line
>         a[++n] = $0                     # add term to indexed a[]
>         next                            # skip to next record
>     }
>     $0 ~ /:$/ {                         # second file and line ends in ':'
>         pizza = $0                      # set pizza name
>         next                            # skip to next record
>     }
>     NF {                                # second file and non-blank line
>         for (i=1; i<=n; i++) {          # loop over a[] array check against terms
>             if ($0 == a[i]) {           # if line matches term
>                 found[$0]++             # increment the found count
>                 c[$0] = c[$0]pizza"\n"  # concatenate pizza to c[] capture array
>             }
>         }
>     }
>     END {                               # end rule
>         for (i=1; i<=n; i++) {          # loop over terms, output count and pizzas
>             printf "searching pizza containing: %s\nfound %d\n", a[i], found[a[i]]
>             printf "%s", c[a[i]]
>         }
>     }
> ' terms.txt recipes.yml
searching pizza containing: cheese
found 4
Margherita:
Chicken Supreme:
Veggie:
Potato:
searching pizza containing: onions
found 2
Chicken Supreme:
Veggie:
searching pizza containing: rucola
found 0

Let em know if you have further questions, and compare the efficiencies @EdMorton incorporated.

Upvotes: 3

Ed Morton
Ed Morton

Reputation: 203577

$ cat tst.awk
NR==FNR {
    count[$1] = 0
    next
}
/^[^[:space:]]/ {
    sub(/:.*/,"")
    type = $0
    next
}
$1 in count || ( sub(/s$/,"",$1) && ($1 in count) ) {
    types[$1] = (count[$1]++ ? types[$1] ORS : "") "  " type
}
END {
    for (term in count) {
        print "searching pizza containing:", term
        print "found", count[term]
        if ( count[term] != 0 ) {
            print types[term]
        }
    }
}

$ awk -f tst.awk terms.txt recipes.yml
searching pizza containing: rucola
found 0
searching pizza containing: cheese
found 4
  Margherita
  Chicken Supreme
  Veggie
  Potato
searching pizza containing: onion
found 2
  Chicken Supreme
  Veggie

Upvotes: 2

Related Questions