Reputation: 105
I need to sort a CSV table by the column name. The data I get in CSV has various number of columns, but the column name is static. This is the data I have:
X,Blue,Black,Red,Green,Total
Thomas,2,0,0,0,2
Andy,0,1,0,0,1
Luise,0,0,2,1,3
Mark,1,0,0,1,2
Ronda,0,1,0,1,2
Pretty print:
X Blue Black Red Green Total
Thomas 2 0 0 0 2
Andy 0 1 0 0 1
Luise 0 0 2 1 3
Mark 1 0 0 1 2
Ronda 0 1 0 1 2
Normally I could just sort by row[5]
but if I get a table without any Greens, or a table with additional column - Yellow, it will no longer work.
Therefore the question is, how can I sort this CSV file by the column named Total
?
Can only use core python modules, no pandas. :(
Upvotes: 7
Views: 12668
Reputation: 46759
If you need to sort using a column name, it would be best to read your file using a Python csv.DictReader()
object as follows:
import csv
with open('input.csv', 'r', newline='') as f_input:
csv_input = csv.DictReader(f_input)
data = sorted(csv_input, key=lambda row: (row['Total'], row['X']))
with open('output.csv', 'w', newline='') as f_output:
csv_output = csv.DictWriter(f_output, fieldnames=csv_input.fieldnames)
csv_output.writeheader()
csv_output.writerows(data)
This would then allow you to sort based on the Total
column regardless of where it is. The csv.DictReader()
reads each row as a dictionary, using the first row as a header. The header values can then be used as dictionary keys to access items rather than using positional values with a normal csv.reader()
. So your first data row would be read as:
{'Total': '1', 'Blue': '0', 'Black': '1', 'Green': '0', 'X': 'Andy', 'Red': '0'}
csv_input
can then be passed directly to sorted()
to create a sorted list of row dictionaries. The key
is the total
field and then also the X
field for the cases where the total is the same.
This would then write a sorted output.csv
as follows:
X,Blue,Black,Red,Green,Total
Andy,0,1,0,0,1
Mark,1,0,0,1,2
Ronda,0,1,0,1,2
Thomas,2,0,0,0,2
Luise,0,0,2,1,3
Upvotes: 7
Reputation: 1406
You can define a function to sort according to any column.
import csv
import operator
data = csv.reader(open('input.csv'))
# function to sort according to any column.
# table corresponds to data and col is argument for the row number. here 5
def sort_table(table, col=0):
return sorted(table, key=operator.itemgetter(col))
print(sort_table(data, 5))
However pandas are better option. Try getting used to that.
Upvotes: 1
Reputation: 866
Something like this,
reader = csv.reader(open('input.csv'))
header = reader.next()
sort_col_idx = header.index('Total')
sorted(reader, key=lambda row: row[sort_col_idx]) # Without header
[header] + sorted(reader, key=lambda row: row[sort_col_idx]) # With headers.
Upvotes: 1