Reputation: 75
I'm trying to sort this line (2010 census block population densities for the whole US) by the last column.
12001,2,1009,Alachua FL,29.65612,-82.327274,0.0005131,0.013289229,12,902.9869232
censusBlockDensities.csv
(moved here from comment)
17001,1,1010,Adams IL,39.960197,-91.373363,0.08861,00.037495258,23,613.41090336
17001,1,1020,Adams IL,39.955861,-91.354113,0.19038,0.493081936,2,4.05612100686
17001,1,1031,Adams IL,39.956978,-91.369,0.002268,0.005874093,0,0,22.8543955664
17001,1,1041,Adams IL,39.94333,-91.345319,0.000358,0.0009236128,0,0480.4506562
17001,1,1051,Adams IL,39.948201,-91.352052,0.213797,0.553731688,64,115.5794427
Upvotes: 3
Views: 538
Reputation: 7494
I am assuming a unix shell (i.e., bash).
Read the manual page for the sort command:
man sort
From the man page:
The locale specified by the environment affects sort order. Set LC_ALL=C to get the traditional sort order that uses native byte values.
export LC_ALL=C
sort -t , -k 10,10 -n censusBlockDensities.csv
Explanation of the flags:
-t ,
: Specifies commas as field delimiters.
-k 10,10
: Specifies sorting on the 10th field only (start,stop) (the first field is 1, not 0).
KEYDEF is F[.C][OPTS][,F[.C][OPTS]] for start and stop position, where F is a field number and C a character position in the field; both are origin 1, and the stop position defaults to the line's end. If neither -t nor -b is in effect, characters in a field are counted from the beginning of the preceding whitespace. OPTS is one or more single-letter ordering options [bdfgiMhnRrV], which override global ordering options for that key. If no key is given, use the entire line as the key.
-n
: Perform a numeric sort instead of the default alphanumeric sort (alternatively, add the 'n' to the -k
parameter as mention in a comment below).
censusBlockDensities.csv
17001,1,1010,Adams IL,39.960197,-91.373363,0.08861,00.037495258,23,613.41090336
17001,1,1020,Adams IL,39.955861,-91.354113,0.19038,0.493081936,2,4.05612100686
17001,1,1031,Adams IL,39.956978,-91.369,0.002268,0.005874093,0,0,22.8543955664
17001,1,1041,Adams IL,39.94333,-91.345319,0.000358,0.0009236128,0,0480.4506562
17001,1,1051,Adams IL,39.948201,-91.352052,0.213797,0.553731688,64,115.5794427
output:
17001,1,1020,Adams IL,39.955861,-91.354113,0.19038,0.493081936,2,4.05612100686
17001,1,1031,Adams IL,39.956978,-91.369,0.002268,0.005874093,0,0,22.8543955664
17001,1,1051,Adams IL,39.948201,-91.352052,0.213797,0.553731688,64,115.5794427
17001,1,1041,Adams IL,39.94333,-91.345319,0.000358,0.0009236128,0,0480.4506562
17001,1,1010,Adams IL,39.960197,-91.373363,0.08861,00.037495258,23,613.41090336
Edit: A helpful comment indicated an error in my answer. You also need the -n
flag to perform a numeric sort (default is alphanumeric). I've amended my answer to include that. You can verify that it is working properly by also trying out the -r
flag to sort in reverse order. I also added the stop field index to the -k 10
parameter, as mentioned in another post.
Additionally, you should check your input file to make sure you have the same number of fields in each line:
awk '{print gsub(/,/,"")}' censusBlockDensities.csv
9
9
10 <-- the third record has an additional field
9
9
Upvotes: 5