Joe T
Joe T

Reputation: 11

Finding presence of substring within a string in BASH

I have a script that is trying to find the presence of a given string inside a file of arbitrary text.

I've settled on something like:

#!/bin/bash
file="myfile.txt"
for j in `cat blacklist.txt`; do
  echo Searching for $j...
  unset match
  match=`grep -i -m1 -o "$j" $file`
  if [ $match ]; then
    echo "Match: $match"
 fi
done

Blacklist.txt contains lines of potential matches, like so:

matchthis
"match this too"
thisisasingleword
"This is multiple words"

myfile.txt could be something like:

I would matchthis if I could match things with grep.  I really wish I could. 
When I ask it to match this too, it fails to matchthis.  It should match this too - right?

If I run this at a bash prompt, like so:

j="match this too"
grep -i -m1 -o "$j" myfile.txt

...I get "match this too".

However, when the batch file runs, despite the variables being set correctly (verified via echo lines), it never greps properly and returns nothing.

Where am I going wrong?

Upvotes: 0

Views: 137

Answers (4)

Piotr Henryk Dabrowski
Piotr Henryk Dabrowski

Reputation: 901

With this:

if [ $match ]; then

you are passing random arguments to test. This is not how you properly check for variable net being empty. Use test -n:

if [ -n "$match" ]; then

You might also use grep's exit code instead:

if [ "$?" -eq 0 ]; then

for ... in X splits X at spaces by default, and you are expecting the script to match whole lines.
Define IFS properly:

IFS='
'
for j in `cat blacklist.txt`; do

blacklist.txt contains "match this too" with quotes, and it is read like this by for loop and matched literally.

j="match this too" does not cause j variable to contain quotes.
j='"match this too"' does, and then it will not match.

Since whole lines are read properly from the blacklist.txt file now, you can probably remove quotes from that file.

Script:

#!/bin/bash
file="myfile.txt"
IFS='
'
for j in `cat blacklist.txt`; do
  echo Searching for $j...
  unset match
  match=`grep -i -m1 -o "$j" "$file"`
  if [ -n "$match" ]; then
    echo "Match: $match"
  fi
done

Alternative to the for ... in ... loop (no IFS= needed):

while read; do
    j="$REPLY"
    ...
done < 'blacklist.txt'

Upvotes: 0

user1934428
user1934428

Reputation: 22225

Wouldn't

grep -owF -f blacklist.txt myfile.txt 

instead of writing an inefficient loop, do what you want?

Upvotes: 2

Joe T
Joe T

Reputation: 11

I wound up abandoning grep entirely and using sed instead.

match=`sed -n "s/.*\($j\).*/\1/p" $file

Works well, and I was able to use unquoted multiple word phrases in the blacklist file.

Upvotes: 0

tshiono
tshiono

Reputation: 22012

Would you please try:

#!/bin/bash

file="myfile.txt"

while IFS= read -r j; do
    j=${j#\"}; j=${j%\"}                        # remove surrounding double quotes
    echo "Searching for $j..."

    match=$(grep -i -m1 -o "$j" "$file")
    if (( $? == 0 )); then                      # if match
        echo "Match: $match"                    # then print it
    fi
done < blacklist.txt

Output:

Searching for matchthis...
Match: matchthis
Searching for match this too...
Match: match this too
match this too
Searching for thisisasingleword...
Searching for This is multiple words...

Upvotes: 0

Related Questions