Village
Village

Reputation: 24393

Sorting and removing duplicate words in a line

The sort command lets me put lines in alphabetical order and remove duplicate lines. I need something similar that can sort the words on a single line, put them in order, and remove any duplicates. Is there a command for this?

E.g.:

zebra ant spider spider ant zebra ant

Changes to:

ant spider zebra

There is no space before the first word or after the last word.

Upvotes: 42

Views: 41777

Answers (7)

Carlo Wood
Carlo Wood

Reputation: 6791

All of the answers prior to this one can only sort a single line at time. The following can be used to pipe multiple lines into and it will print the sorted list of unique words for each line.

one-liner:

awk '{ delete a; for (i=1; i<=NF; i++) a[$i]++; n=asorti(a, b); for (i=1; i<n; i++) printf b[i]" "; print b[n] }'

As executable script file sort_line_words:

#!/usr/bin/env -S awk -f
# original source:
# https://stackoverflow.com/a/25823667/586229
#
# Usage:
#   awk -f sort_line_words < INPUT_FILE > OUTPUT_FILE
#   cat INPUT_FILE | sort_line_words
# Example:
#   awk -f sort_line_words < input.txt > output.txt
#   cat input.txt | sort_line_words | sort -u

{
    delete a
    for (i = 1; i <= NF; i++) {
        a[$i]++
        n = asorti(a, b)
    }
    for (i = 1; i < n; i++) {
        printf b[i]" "
    }
    print b[n]
}

Thanks @jaypai for a lot of the syntax used in this.

Example:

$ cat file
group label wearable edit_group edit_group_order label_max camera_elevation camera_distance name label_min label_max value_min value_max camera_angle camera_elevation id
id group label wearable edit_group clothing_morph value_min value_max name value_default clothing_morph group
id label show_simple wearable name edit_group edit_group_order group clothing_morph clothing_morph camera_distance label_min label_max value_min value_max camera_distance camera_angle
id group label wearable name edit_group clothing_morph value_min value_max value_default
group label wearable id clothing_morph edit_group edit_group_order label_min label_max value_min value_max name camera_distance camera_angle camera_elevation
id group label wearable edit_group name label_min label_max value_min value_max wearable
name id group wearable edit_group id group wearable id group wearable id group wearable value_min value_max

$ cat file | sort_line_words
camera_angle camera_distance camera_elevation edit_group edit_group_order group id label label_max label_min name value_max value_min wearable
clothing_morph edit_group group id label name value_default value_max value_min wearable
camera_angle camera_distance clothing_morph edit_group edit_group_order group id label label_max label_min name show_simple value_max value_min wearable
clothing_morph edit_group group id label name value_default value_max value_min wearable
camera_angle camera_distance camera_elevation clothing_morph edit_group edit_group_order group id label label_max label_min name value_max value_min wearable
edit_group group id label label_max label_min name value_max value_min wearable
edit_group group id name value_max value_min wearable

Upvotes: 9

dogbane
dogbane

Reputation: 274622

Use tr to change spaces to new lines, then sort, and finally change new lines back to spaces.

echo $(tr ' ' '\n' <<< "zebra ant spider spider ant zebra ant" | sort -u)

Upvotes: 11

user735796
user735796

Reputation:

The shell was built to parse [:blank:] seperated word lists already. Therefore the use of xargs is completely redundant. The "unique" stuff can be done but its just easier to use sort.

echo $(printf '%s\n' zebra ant spider spider ant zebra ant | sort -u)

Upvotes: 34

jaypal singh
jaypal singh

Reputation: 77105

Using awk:

awk '{for(i=1;i<=NF;i++) a[$i]++} END{for(i in a) printf i" ";print ""}' INPUT_FILE

Test:

[jaypal:~/Temp] cat file
zebra ant spider spider ant zebra ant
[jaypal:~/Temp] awk '{for (i=1;i<=NF;i++) a[$i]++} END{for (i in a) printf i" ";print ""}' file
zebra spider ant 

Upvotes: 1

Birei
Birei

Reputation: 36262

Using perl:

perl -lane '
  %a = map { $_ => 1 } @F;
  print join qq[ ], sort keys %a;
' <<< "zebra ant spider spider ant zebra ant"

Result:

ant spider zebra

Upvotes: 2

kev
kev

Reputation: 161704

Use python

$ echo "zebra ant spider spider ant zebra ant" | python -c 'import sys; print(" ".join(sorted(set(sys.stdin.read().split()))))'
ant spider zebra

Upvotes: 2

jcollado
jcollado

Reputation: 40394

This works for me:

$ echo "zebra ant spider spider ant zebra ant" | xargs -n1 | sort -u | xargs
ant spider zebra

You can transform a list of words in a single row to a single column with xargs -n1 , use sort -u and transform back to a single row with xargs.

Upvotes: 84

Related Questions