Reputation: 171

Unix "sort" command for a CSV file

I have a .csv file with entries that look like:

"29 January 2016 19:33 EST","Mary Z Allen",...
"01 February 2016 16:29 EST","Kendra A Zimmerman",...
"22 February 2016 12:08 EST","Shawn Baker",...

The first CSV field (date/time) is assigned by the system, and always has exactly five words. The second CSV field(name), consists of one or more words.

I want to sort by the final word in the second field. For this example, the desired order after sort would be

"29 January 2016 19:33 EST","Mary Z Allen",...
"22 February 2016 12:08 EST","Shawn Baker",...
"01 February 2016 16:29 EST","Kendra A Zimmerman",...

No doubt, with a little effort, one could come up with a bash, awk, or python script to perform this kind of sort. But is there a way to use the sort command directly?

The specific Unix version I am using (from /proc/version) is

Linux version 3.13.0-79-generic (buildd@lcy01-11) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) ) #123-Ubuntu SMP Fri Feb 19 14:28:32 UTC 2016

Upvotes: 0

Answers (3)

Walter A

Reputation: 20032

You can use sed to copy the lastline in front of your line. That way sorting is easy and you only need to delete the extra data. The sed command will need to look for strings without a double quote using [^"]*, resulting in

sed 's/\("[^"]*","[^"]* \)\([^"]*"\)/\2=\1\2/' testfile | sort | cut -d= -f2

Upvotes: 0

karakfa

Reputation: 67567

awk to the rescue! with decorate/sort/un-decorate pattern.

$ awk -F, '{t=$2; sub(/.+ /,"",t); print t"\t"$0}' file | sort | cut -f2-

"29 January 2016 19:33 EST","Mary Z Allen",...
"22 February 2016 12:08 EST","Shawn Baker",...
"01 February 2016 16:29 EST","Kendra A Zimmerman",...

print the last word of the second field as key, sort and remove the dummy key.

Upvotes: 2

dannysauer

Reputation: 3877

No. The sort command can split into fields, so if you just wanted to sort by name, you could do something like sort -t, -k2. But for this, what you'll have to do is to split the lines out. Here's a very simplistic example of extracting the thing you want to sort upon, prepending it to the line, sorting on only the first field, then removing that field.

user@machine[/home/user/dev]
$ cat testfile
"22 February 2016 12:08 EST","Shawn Baker",...
"29 January 2016 19:33 EST","Mary Z Allen",...
"01 February 2016 16:29 EST","Kendra A Zimmerman",...
user@machine[/home/user/dev]
$ paste <(cut -d, -f2 testfile | awk '$0=$NF') testfile | sort -k1,1 | cut -f2-
"29 January 2016 19:33 EST","Mary Z Allen",...
"22 February 2016 12:08 EST","Shawn Baker",...
"01 February 2016 16:29 EST","Kendra A Zimmerman",...

Note that this code to extract the desired field makes the bad assumption that the the first and second fields won't contain a comma: cut -d, -f2 testfile | awk '$0=$NF' If they may, then you'll want to replace it with something smarter. The rest of the code should be fine, as paste and cut defualt to tabs, and sort/awk are using whitespace.

Upvotes: 0

Unix &quot;sort&quot; command for a CSV file

Answers (3)

Related Questions

Unix "sort" command for a CSV file