Chris
Chris

Reputation: 1249

Does array 1 contain any of the strings in array 2

I am trying to make an if statement where if array1 contains any of the strings in array2 it should print "match" else print "no match"

So far I have the following. Not sure how to complete it. Both loops should break as soon as the first match is found.

#!/bin/bash

array1=(a b c 1 2 3)
array2=(b 1)

for a in "${array1[@]}"
do
    for b in "${array2[@]}"
    do
        if [ "$a" == "$b" ]; then
            echo "Match!"
            break
        fi
    done
done

Maybe this isn't even the best way to do it?

This illustrates the desired result

if [ array1 contains strings in array2 ]
then
  echo "match"
else
  echo "no match"
fi

Upvotes: 0

Views: 110

Answers (4)

Socowi
Socowi

Reputation: 27330

To check whether array1 contains any entry from array2 you can use grep. This will be way faster and shorter than loops in bash.

The following commands exit with status code 0 if and only if there is a match. Use them as ...

if COMMAND FROM BELOW; then
   echo match
else
   echo no match
fi

Single-Line Array Entries

The simple version for strings without linebreaks is

printf %s\\n "${array1[@]}" | grep -qFxf <(printf %s\\n "${array2[@]}")

Multiline Array Entries

Sadly there doesn't seem to be a straightforward way to make this work for array entries with linebreaks. GNU grep has the option -z to set the "line" delimiters in the input to null, but apparently no option to do the same for the file provided to -f. Listing the entries from array2 as -e arguments to grep is not working either -- grep -F seems to be unable to match multiline patterns. However, we can use the following hack:

printf %q\\n "${array1[@]}" | grep -qFxf <(printf %q\\n "${array2[@]}")

Here we assume that bash's built-in printf %q always prints a unique single line -- which it currently does. However, future implementations of bash may change this. The documentation help printf only states that the output thas to be correctly quoted for bash.

Upvotes: 3

tshiono
tshiono

Reputation: 22042

Would you try the following:

array1=(a b c 1 2 3)
array2=(b 1)

declare -A seen    # set marks of the elements of array1
for b in "${array2[@]}"; do
    (( seen[$b]++ ))
done

for a in "${array1[@]}"; do
    (( ${seen[$a]} )) && echo "match" && exit
done
echo "no match"

It may be efficient by avoiding the double loop, although the discussion of efficiency may be meaningless as long as using bash :)

Upvotes: 0

paxdiablo
paxdiablo

Reputation: 882326

For a fast solution, you're better off using an external tool that can process the entire array as a whole (such as the grep-based answers). Doing nested loops in pure bash is likely to be slower for any substantial amount of data (where the item-by-item processing in bash is likely to be more expensive than the external process start-up time).

However, if you do need a pure bash solution, I see that your current solution has no way to print out the "no match" scenario. In addition, it may print out "match" multiple times.

To fix that, you can just store the fact that a match has been found, and use that to both:

  • exit the outer loop early as well as the inner loop; and
  • print the correct string at the end.

To do this, you can use something like:

#!/bin/bash

# Test data.

array1=(a b c 1 2 3)
array2=(b 1)

# Default to not-found state.

foundMatch=false
for a in "${array1[@]}" ; do
    for b in "${array2[@]}" ; do
        # Any match switches to found state and exits inner loop.

        [[ "$a" == "$b" ]] && foundMatch=true && break
    done

    # If found, exit outer loop as well.

    ${foundMatch} && break
done

# Output appropriate message for found/not-found state.

$foundMatch && echo "Match" || echo "No match"

Upvotes: 1

KamilCuk
KamilCuk

Reputation: 141698

For array elements which does not contain newlines, the grep -qf with printf "%s\n" would be a good option. For comparing arrays with any elements, I ended with this:

cmp -s /dev/null <(comm -z12 <(printf "%s\0" "${array1[@]}" | sort -z) <(printf "%s\0" "${array2[@]}" | sort -z))

The printf "%s\0" "${array[@]}" | sort -z print a sorted list of zero terminated array elements. The comm -z12 then extracts common elements in both lists. The cmp -s /dev/null checks if the output of comm is empty, which will not be empty if any element is in both lists. You could use [ -z "$(comm -z ...)" ] to check if the output of comm would be empty, but bash will complain that the output of a command captured with $(..) contains a null byte, so it's better to cmp -s /dev/null.

I think | is faster then <(), so your if could be:

if ! printf "%s\0" "${array1[@]}" | sort -z |
     comm -z12 - <(printf "%s\0" "${array2[@]}" | sort -z) |
     cmp -s /dev/null -; then
        echo "Some elements are in both array1 and array2"
fi

The following could work:

printf "%s\0" "${array1[@]}" | eval grep -qzFx "$(printf " -e %q" "${array2[@]}")"

But I believe I found a bug in grepv3.1 when matching a newline character with -x flag. If you don't use the newline character, the above line works.

Upvotes: 0

Related Questions