Reputation: 2413
I've got almost the same question as here.
I have an array which contains aa ab aa ac aa ad
, etc.
Now I want to select all unique elements from this array.
Thought, this would be simple with sort | uniq
or with sort -u
as they mentioned in that other question, but nothing changed in the array...
The code is:
echo `echo "${ids[@]}" | sort | uniq`
What am I doing wrong?
Upvotes: 144
Views: 148114
Reputation: 11
BASH single liner, without changing order, and having items with spaces:
readarray -t my_array < <( (for i in "${my_array[@]}"; do echo "$i"; done) | awk '!uniq[$0]++' )
Upvotes: 0
Reputation: 66
In zsh you can use (u) flag:
$ ids=(aa ab aa ac aa ad)
$ print ${(u)ids}
aa ab ac ad
Upvotes: 1
Reputation: 47269
A bit hacky, but this should do it:
echo "${ids[@]}" | tr ' ' '\n' | sort -u | tr '\n' ' '
To save the sorted unique results back into an array, do Array assignment:
sorted_unique_ids=($(echo "${ids[@]}" | tr ' ' '\n' | sort -u | tr '\n' ' '))
If your shell supports herestrings (bash
should), you can spare an echo
process by altering it to:
tr ' ' '\n' <<< "${ids[@]}" | sort -u | tr '\n' ' '
A note as of Aug 28 2021:
According to ShellCheck wiki 2207 a read -a
pipe should be used to avoid splitting.
Thus, in bash the command would be:
IFS=" " read -r -a ids <<< "$(echo "${ids[@]}" | tr ' ' '\n' | sort -u | tr '\n' ' ')"
or
IFS=" " read -r -a ids <<< "$(tr ' ' '\n' <<< "${ids[@]}" | sort -u | tr '\n' ' ')"
Input:
ids=(aa ab aa ac aa ad)
Output:
aa ab ac ad
Explanation:
"${ids[@]}"
- Syntax for working with shell arrays, whether used as part of echo
or a herestring. The @
part means "all elements in the array"tr ' ' '\n'
- Convert all spaces to newlines. Because your array is seen by shell as elements on a single line, separated by spaces; and because sort expects input to be on separate lines.sort -u
- sort and retain only unique elementstr '\n' ' '
- convert the newlines we added in earlier back to spaces.$(...)
- Command Substitutiontr ' ' '\n' <<< "${ids[@]}"
is a more efficient way of doing: echo "${ids[@]}" | tr ' ' '\n'
Upvotes: 190
Reputation: 56936
All the following work in bash
and sh
and are without error in shellcheck
but you need to suppress SC2207
arrOrig=("192.168.3.4" "192.168.3.4" "192.168.3.3")
# NO SORTING
# shellcheck disable=SC2207
arr1=($(tr ' ' '\n' <<<"${arrOrig[@]}" | awk '!u[$0]++' | tr '\n' ' ')) # @estani
len1=${#arr1[@]}
echo "${len1}"
echo "${arr1[*]}"
# SORTING
# shellcheck disable=SC2207
arr2=($(printf '%s\n' "${arrOrig[@]}" | sort -u)) # @das.cyklone
len2=${#arr2[@]}
echo "${len2}"
echo "${arr2[*]}"
# SORTING
# shellcheck disable=SC2207
arr3=($(echo "${arrOrig[@]}" | tr ' ' '\n' | sort -u | tr '\n' ' ')) # @sampson-chen
len3=${#arr3[@]}
echo "${len3}"
echo "${arr3[*]}"
# SORTING
# shellcheck disable=SC2207
arr4=($(for i in "${arrOrig[@]}"; do echo "${i}"; done | sort -u)) # @corbyn42
len4=${#arr4[@]}
echo "${len4}"
echo "${arr4[*]}"
# NO SORTING
# shellcheck disable=SC2207
arr5=($(echo "${arrOrig[@]}" | tr "[:space:]" '\n' | awk '!a[$0]++')) # @faustus
len5=${#arr5[@]}
echo "${len5}"
echo "${arr5[*]}"
# OUTPUTS
# arr1
2 # length
192.168.3.4 192.168.3.3 # items
# arr2
2 # length
192.168.3.3 192.168.3.4 # items
# arr3
2 # length
192.168.3.3 192.168.3.4 # items
# arr4
2 # length
192.168.3.3 192.168.3.4 # items
# arr5
2 # length
192.168.3.4 192.168.3.3 # items
Output for all of these is 2 and correct. This answer basically summarises and tidies up the other answers in this post and is a useful quick reference. Attribution to original answer is given.
Upvotes: 2
Reputation: 46816
If you're running Bash version 4 or above (which should be the case in any modern version of Linux), you can get unique array values in bash by creating a new associative array that contains each of the values of the original array. Something like this:
$ a=(aa ac aa ad "ac ad")
$ declare -A b
$ for i in "${a[@]}"; do b["$i"]=1; done
$ printf '%s\n' "${!b[@]}"
ac ad
ac
aa
ad
This works because in any array (associative or traditional, in any language), each key can only appear once. When the for
loop arrives at the second value of aa
in a[2]
, it overwrites b[aa]
which was set originally for a[0]
.
Doing things in native bash can be faster than using pipes and external tools like sort
and uniq
, though for larger datasets you'll likely see better performance if you use a more powerful language like awk, python, etc.
If you're feeling confident, you can avoid the for
loop by using printf
's ability to recycle its format for multiple arguments, though this seems to require eval
. (Stop reading now if you're fine with that.)
$ eval b=( $(printf ' ["%s"]=1' "${a[@]}") )
$ declare -p b
declare -A b=(["ac ad"]="1" [ac]="1" [aa]="1" [ad]="1" )
The reason this solution requires eval
is that array values are determined before word splitting. That means that the output of the command substitution is considered a single word rather than a set of key=value pairs.
While this uses a subshell, it uses only bash builtins to process the array values. Be sure to evaluate your use of eval
with a critical eye. If you're not 100% confident that chepner or glenn jackman or greycat would find no fault with your code, use the for loop instead.
Upvotes: 45
Reputation: 1038
Another option for dealing with embedded whitespace, is to null-delimit with printf
, make distinct with sort
, then use a loop to pack it back into an array:
input=(a b c "$(printf "d\ne")" b c "$(printf "d\ne")")
output=()
while read -rd $'' element
do
output+=("$element")
done < <(printf "%s\0" "${input[@]}" | sort -uz)
At the end of this, input
and output
contain the desired values (provided order isn't important):
$ printf "%q\n" "${input[@]}"
a
b
c
$'d\ne'
b
c
$'d\ne'
$ printf "%q\n" "${output[@]}"
a
b
c
$'d\ne'
Upvotes: 3
Reputation: 11
# Read a file into variable
lines=$(cat /path/to/my/file)
# Go through each line the file put in the variable, and assign it a variable called $line
for line in $lines; do
# Print the line
echo $line
# End the loop, then sort it (add -u to have unique lines)
done | sort -u
Upvotes: -2
Reputation: 3127
cat number.txt
1 2 3 4 4 3 2 5 6
print line into column:
cat number.txt | awk '{for(i=1;i<=NF;i++) print $i}'
1
2
3
4
4
3
2
5
6
find the duplicate records:
cat number.txt | awk '{for(i=1;i<=NF;i++) print $i}' |awk 'x[$0]++'
4
3
2
Replace duplicate records:
cat number.txt | awk '{for(i=1;i<=NF;i++) print $i}' |awk '!x[$0]++'
1
2
3
4
5
6
Find only Uniq records:
cat number.txt | awk '{for(i=1;i<=NF;i++) print $i|"sort|uniq -u"}
1
5
6
Upvotes: 5
Reputation: 121
If you want a solution that only uses bash internals, you can set the values as keys in an associative array, and then extract the keys:
declare -A uniqs
list=(foo bar bar "bar none")
for f in "${list[@]}"; do
uniqs["${f}"]=""
done
for thing in "${!uniqs[@]}"; do
echo "${thing}"
done
This will output
bar
foo
bar none
Upvotes: 5
Reputation: 13
Try this to get uniq values for first column in file
awk -F, '{a[$1];}END{for (i in a)print i;}'
Upvotes: 0
Reputation: 151
'sort' can be used to order the output of a for-loop:
for i in ${ids[@]}; do echo $i; done | sort
and eliminate duplicates with "-u":
for i in ${ids[@]}; do echo $i; done | sort -u
Finally you can just overwrite your array with the unique elements:
ids=( `for i in ${ids[@]}; do echo $i; done | sort -u` )
Upvotes: 15
Reputation: 5502
To create a new array consisting of unique values, ensure your array is not empty then do one of the following:
readarray -t NewArray < <(printf '%s\n' "${OriginalArray[@]}" | sort -u)
readarray -t NewArray < <(printf '%s\n' "${OriginalArray[@]}" | awk '!x[$0]++')
Warning: Do not try to do something like NewArray=( $(printf '%s\n' "${OriginalArray[@]}" | sort -u) )
. It will break on spaces.
Upvotes: 11
Reputation: 323
this one will also preserve order:
echo ${ARRAY[@]} | tr [:space:] '\n' | awk '!a[$0]++'
and to modify the original array with the unique values:
ARRAY=($(echo ${ARRAY[@]} | tr [:space:] '\n' | awk '!a[$0]++'))
Upvotes: 12
Reputation: 26447
Without loosing the original ordering:
uniques=($(tr ' ' '\n' <<<"${original[@]}" | awk '!u[$0]++' | tr '\n' ' '))
Upvotes: 5
Reputation: 679
If your array elements have white space or any other shell special character (and can you be sure they don't?) then to capture those first of all (and you should just always do this) express your array in double quotes! e.g. "${a[@]}"
. Bash will literally interpret this as "each array element in a separate argument". Within bash this simply always works, always.
Then, to get a sorted (and unique) array, we have to convert it to a format sort understands and be able to convert it back into bash array elements. This is the best I've come up with:
eval a=($(printf "%q\n" "${a[@]}" | sort -u))
Unfortunately, this fails in the special case of the empty array, turning the empty array into an array of 1 empty element (because printf had 0 arguments but still prints as though it had one empty argument - see explanation). So you have to catch that in an if or something.
Explanation: The %q format for printf "shell escapes" the printed argument, in just such a way as bash can recover in something like eval! Because each element is printed shell escaped on it's own line, the only separator between elements is the newline, and the array assignment takes each line as an element, parsing the escaped values into literal text.
e.g.
> a=("foo bar" baz)
> printf "%q\n" "${a[@]}"
'foo bar'
baz
> printf "%q\n"
''
The eval is necessary to strip the escaping off each value going back into the array.
Upvotes: 19
Reputation: 461
I realize this was already answered, but it showed up pretty high in search results, and it might help someone.
printf "%s\n" "${IDS[@]}" | sort -u
Example:
~> IDS=( "aa" "ab" "aa" "ac" "aa" "ad" )
~> echo "${IDS[@]}"
aa ab aa ac aa ad
~>
~> printf "%s\n" "${IDS[@]}" | sort -u
aa
ab
ac
ad
~> UNIQ_IDS=($(printf "%s\n" "${IDS[@]}" | sort -u))
~> echo "${UNIQ_IDS[@]}"
aa ab ac ad
~>
Upvotes: 33