Using "comm" to find matches between two arrays

Question

I have two arrays, I am trying to find matching values using comm. Array1 contains some additional information in each element that I strip out for the comparison. However, I would like to keep that information after the comparison is complete.

For example:

Array1=("abc",123,"hello" "def",456,"world")
Array2=("abc")
declare -a Array1
declare -a Array2

I then compare the two arrays:

oldIFS=$IFS IFS=$'
	'
array3=($(comm -12 <(echo "${Array1[*]}" | awk -F "," {'print $1'} | sort) <(echo "${Array2[*]}" | sort)))
IFS=$oldIFS

Which finds the match of abc:

echo ${test3[0]}
abc

However what I want is remaining values from array1 that were not part of my comm statement.

abc,123,hello

EDIT: For more clarification

The arrays in this example are populated with dummy data.

My real example is pulling information from server logs which I am saving into array1. array1 contains (userIDs,hostIPs,count) that I want to cross reference against a list of userID's (array2). My goal is to find out what userIDs exsist in array1 and array2 and save those ID's with the additional information from array1 (hostIPs,count) into array3

array1 is populated from a variable that is is the results of a curl command that generates a splunk search. The data returned looks like this:

"uniqueID=","","",1

I save the results of the splunk report as $splunk, and then decalare array1 with the results of $splunk - the header information since the results come back in csv format

array1=( $(echo $splunk | sed 's/ /
/g' | sed 1d) )

array2 is generated from a master file that I have stored locally. That contains all the application ID's in our ecosystem. For example

uid=

I cat the contents of the master file into array2

array2=( $(cat master.txt) )

I then want to find what IDs from array1 exsist in array2 and save that as array3. This requires some massaging of the data in array1 to make it match the format of array2.

oldIFS=$IFS IFS=$'
	'
array3=($(comm -12 <(echo "${array1[*]}" | sed 's/ /
/g' | awk -F ""," {'print $1'} | sed 's/"//g' | sed 's/|/ /g' | awk -F$'=' -v OFS=$'=' '{ $1 = "uid" }1' | grep -i "OU=People" | sed 's/OU/ou/g' | sort) <(echo "${array2[*]}" | sort)))
IFS=$oldIFS

array 3 will then contain lines that match in both arrays

uid=
uid=

However I am looking for something more along the line of

"uid=","","",1
"uid=","","",1

Benjamin W. · Accepted Answer

I would do it like this:

join -t, \
    <(printf '%s
' "${Array1[@]}" | sort -t, -k1,1) \
    <(printf '%s
' "${Array2[@]}" | sort)

Use the join command with , as the field delimiter. The first "file" is the first array, one element per line, sorted on the first field (comma delimited); the second "file" is the second array, one element per line, sorted.

The output will be every line where the first element of the first file matches the element from the second file; for the example input it's

abc,123,hello

This makes only one assumption, namely that no array element contains a newline. To make it more robust (assuming GNU Coreutils), we can use NUL as the delimiter:

join -z -t, \
    <(printf '%s\0' "${Array1[@]}" | sort -z -t, -k1,1) \
    <(printf '%s\0' "${Array2[@]}" | sort -z)

This prints the output separated by NUL as well; to read the result into an array, we can use readarray:

readarray -d '' -t Array3 < <(
    join -z -t, \
        <(printf '%s\0' "${Array1[@]}" | sort -z -t, -k1,1) \
        <(printf '%s\0' "${Array2[@]}" | sort -z)
)

readarray -d requires Bash 4.4 or newer. For older Bash, you can use a loop:

while IFS= read -r -d '' element; do
    Array3+=("$element")
done < <(
    join -z -t, \
        <(printf '%s\0' "${Array1[@]}" | sort -z -t, -k1,1) \
        <(printf '%s\0' "${Array2[@]}" | sort -z)
)

Using "comm" to find matches between two arrays

Answers (2)

Related Questions

Using &quot;comm&quot; to find matches between two arrays

Answers (2)

Related Questions

Using "comm" to find matches between two arrays