Piotr Migdal
Piotr Migdal

Reputation: 12792

A Pythonic way to read CSV with row and column headers

Let's have a CSV table with row and column headers, e.g.:

, "Car", "Bike", "Boat", "Plane", "Shuttle"
"Red", 1, 7, 3, 0, 0
"Green", 5, 0, 0, 0, 0
"Blue", 1, 1, 4, 0, 1

I want to get row and column headers, i.e.:

col_headers = ["Car", "Bike", "Boat", "Plane", "Shuttle"]
row_headers = ["Red", "Green", "Blue"]
data = [[1, 7, 3, 0, 0],
        [5, 0, 0, 0, 0],
        [1, 1, 4, 0, 1]]

Of course I can do something like

import csv
with open("path/to/file.csv", "r") as f:
    csvraw = list(csv.reader(f))
col_headers = csvraw[1][1:]
row_headers = [row[0] for row in csvraw[1:]]
data = [row[1:] for row in csvraw[1:]]

...but it does not look Pythonic enough.

Is there a neater way for this natural operation?

Upvotes: 7

Views: 13620

Answers (5)

i love mistaking
i love mistaking

Reputation: 103

Agree, pandas is the best I found too. I'm interested into read specific values of my dataframe. Here is what I did:

import pandas as pd
d=pd.read_csv(pathToFile+"easyEx.csv")
print(d)
print(d.index.values)
print(d.index.values[2])
print(d.columns.values)
print(d.columns.values[2])
print(pd.DataFrame(d,index=['Blue'],columns=['Boat'])+0.333)

And this is what it returns:

         Car  Bike  Boat  Plane  Shuttle                                                                                                            
Red      1     7     3      0        0                                                                                                            
Green    5     0     0      0        0                                                                                                            
Blue     1     1     4      0        1                                                                                                            
['Red' 'Green' 'Blue']                                                                                                                            
Blue                                                                                                                                              
['Car' 'Bike' 'Boat' 'Plane' 'Shuttle']                                                                                                           
Boat                                                                                                                                              
      Boat                                                                                                                                        
Blue     4.333                              

Note that I can check row names with 'index' and column names with 'columns'. Also note that I can read a specific element of the 'dataframe' base on its row and column names, and that the values are still numeric, that why I added '+0.333' in the last print.

I chaged the data file, I removed the quote chars (" ") and spaces after commas in the first row. Here you have 'easyEx.csv' file:

Car,Bike,Boat,Plane,Shuttle
Red, 1, 7, 3, 0, 0
Green, 5, 0, 0, 0, 0
Blue, 1, 1, 4, 0, 1

Hope this can help =)

Upvotes: 0

Jon Clements
Jon Clements

Reputation: 142126

Without 3rd party libs (and if you can live with results being tuples from the zip op):

import csv

with open('your_csv_file') as fin:
    csvin = csv.reader(fin, skipinitialspace=True)
    col_header = next(csvin, [])[1:]
    row_header, data = zip(*((row[0], row[1:]) for row in csvin))

Gives you for col_header, row_header and data:

['Bike', 'Boat', 'Plane', 'Shuttle']
('Red', 'Green', 'Blue')
(['1', '7', '3', '0', '0'], ['5', '0', '0', '0', '0'], ['1', '1', '4', '0', '1'])

Upvotes: 0

Piotr Migdal
Piotr Migdal

Reputation: 12792

Now I see that what I want is the easiest (and the most robust) to accomplish with Pandas.

import pandas as pd
df = pd.read_csv('foo.csv', index_col=0)

And if I want, it is easy to extract:

col_headers = list(df.columns)
row_headers = list(df.index)

Otherwise, in the "raw" Python, it seems that the method I wrote in the question is "good enough".

Upvotes: 2

Davoud Taghawi-Nejad
Davoud Taghawi-Nejad

Reputation: 16776

I am aware that this solution gives you another output format than the requested, but it is very convenient. This reads the csv line into a dictionary:

reader = csv.reader(open(parameters_file), dialect)

keys = [key.lower() for key in reader.next()]
for line in reader:
    parameter = dict(zip(keys, cells))

Upvotes: 0

Gareth Latty
Gareth Latty

Reputation: 88977

Take a look at csv.DictReader.

If the fieldnames parameter is omitted, the values in the first row of the csvfile will be used as the fieldnames.

Then you can just do reader.fieldnames. This, of course, only gives you column headers. You would still have to parse the row headers manually.

I think your original solution is pretty good, however.

Upvotes: 4

Related Questions