Reputation: 12792
Let's have a CSV table with row and column headers, e.g.:
, "Car", "Bike", "Boat", "Plane", "Shuttle"
"Red", 1, 7, 3, 0, 0
"Green", 5, 0, 0, 0, 0
"Blue", 1, 1, 4, 0, 1
I want to get row and column headers, i.e.:
col_headers = ["Car", "Bike", "Boat", "Plane", "Shuttle"]
row_headers = ["Red", "Green", "Blue"]
data = [[1, 7, 3, 0, 0],
[5, 0, 0, 0, 0],
[1, 1, 4, 0, 1]]
Of course I can do something like
import csv
with open("path/to/file.csv", "r") as f:
csvraw = list(csv.reader(f))
col_headers = csvraw[1][1:]
row_headers = [row[0] for row in csvraw[1:]]
data = [row[1:] for row in csvraw[1:]]
...but it does not look Pythonic enough.
Is there a neater way for this natural operation?
Upvotes: 7
Views: 13620
Reputation: 103
Agree, pandas is the best I found too. I'm interested into read specific values of my dataframe. Here is what I did:
import pandas as pd
d=pd.read_csv(pathToFile+"easyEx.csv")
print(d)
print(d.index.values)
print(d.index.values[2])
print(d.columns.values)
print(d.columns.values[2])
print(pd.DataFrame(d,index=['Blue'],columns=['Boat'])+0.333)
And this is what it returns:
Car Bike Boat Plane Shuttle
Red 1 7 3 0 0
Green 5 0 0 0 0
Blue 1 1 4 0 1
['Red' 'Green' 'Blue']
Blue
['Car' 'Bike' 'Boat' 'Plane' 'Shuttle']
Boat
Boat
Blue 4.333
Note that I can check row names with 'index' and column names with 'columns'. Also note that I can read a specific element of the 'dataframe' base on its row and column names, and that the values are still numeric, that why I added '+0.333' in the last print.
I chaged the data file, I removed the quote chars (" ") and spaces after commas in the first row. Here you have 'easyEx.csv' file:
Car,Bike,Boat,Plane,Shuttle
Red, 1, 7, 3, 0, 0
Green, 5, 0, 0, 0, 0
Blue, 1, 1, 4, 0, 1
Hope this can help =)
Upvotes: 0
Reputation: 142126
Without 3rd party libs (and if you can live with results being tuples from the zip
op):
import csv
with open('your_csv_file') as fin:
csvin = csv.reader(fin, skipinitialspace=True)
col_header = next(csvin, [])[1:]
row_header, data = zip(*((row[0], row[1:]) for row in csvin))
Gives you for col_header
, row_header
and data
:
['Bike', 'Boat', 'Plane', 'Shuttle']
('Red', 'Green', 'Blue')
(['1', '7', '3', '0', '0'], ['5', '0', '0', '0', '0'], ['1', '1', '4', '0', '1'])
Upvotes: 0
Reputation: 12792
Now I see that what I want is the easiest (and the most robust) to accomplish with Pandas.
import pandas as pd
df = pd.read_csv('foo.csv', index_col=0)
And if I want, it is easy to extract:
col_headers = list(df.columns)
row_headers = list(df.index)
Otherwise, in the "raw" Python, it seems that the method I wrote in the question is "good enough".
Upvotes: 2
Reputation: 16776
I am aware that this solution gives you another output format than the requested, but it is very convenient. This reads the csv line into a dictionary:
reader = csv.reader(open(parameters_file), dialect)
keys = [key.lower() for key in reader.next()]
for line in reader:
parameter = dict(zip(keys, cells))
Upvotes: 0
Reputation: 88977
Take a look at csv.DictReader
.
If the fieldnames parameter is omitted, the values in the first row of the csvfile will be used as the fieldnames.
Then you can just do reader.fieldnames
. This, of course, only gives you column headers. You would still have to parse the row headers manually.
I think your original solution is pretty good, however.
Upvotes: 4