blehman
blehman

Reputation: 1970

Using sort | awk on one column from a csv?

Using p.txt:

$cat p.txt
R 3
R 4
S 1
S 2
R 1
T 1
R 3

The following command sorts based on the second column:

$cat p.txt | sort -k2
R 1
S 1
T 1
S 2
R 3
R 3
R 4

The following command removes repeated values in the second column:

$cat p.txt | sort -k2 | awk '!x[$2]++'
R 1
S 2
R 3
R 4

Now inserting a comma for the sapce, we have the following file:

$cat p1.csv
R,3
R,4
S,1
S,2
R,1
T,1
R,3

The following command still sorts based on the second column:

$cat p1.csv | sort -t "," -k2
R,1
S,1
T,1
S,2
R,3
R,3
R,4

Below is NOT the correct output:

$cat p1.csv | sort -t "," -k2 | awk '!x[$2]++'
R,1

Correct output:

R,1
S,2
R,3
R,4

Any suggestions?

Upvotes: 1

Views: 10876

Answers (4)

Kent
Kent

Reputation: 195039

well you have already used sort, then you don't need the awk at all. sort has -u

Also the cat is not needed either:

sort -t, -k2 -u p1.csv 

should give you expected output.

Upvotes: 5

abasu
abasu

Reputation: 2524

Well you don't need all such things, sort and uniq are enough to do such things

sort -t "," -k2 p1.csv | uniq -s 2

uniq -s 2 tells uniq to skip first 2 characters (i.e. till ,)

Upvotes: 4

jaypal singh
jaypal singh

Reputation: 77095

You need to provide field separator for awk

 cat p1.csv | sort -t "," -k2 | awk -F, '!x[$2]++'

Upvotes: 1

Ram Rajamony
Ram Rajamony

Reputation: 1723

Try awk -F, in your last command. So:

cat p1.csv | sort -t "," -k2 | awk -F, '!x[$2]++'

Since your fields are separated by commas, you need to tell awk that the field separator is no longer whitespace, but instead the comma. The -F option to awk does that.

Upvotes: 4

Related Questions