Victor Yip
Victor Yip

Reputation: 295

Parsing CSV in Python 101

I'm trying to understand/visualise the process of parsing a raw csv data file in Python from dataquest.io's training course.

I understand that rows = data.split('\n') splits the long string of csv file into rows based on where the line break is. ie:

day1, sunny, \n day2, rain \n

becomes

day1, sunny
day2, rain

I thought the for loop would further break the data into something like:

day 1 
sunny 
day 2 
rain

Instead the course seems to imply it would actually become a list of lists usefully. I don't understand, why does that happen?

weather_data = []

f = open("la_weather.csv", 'r')
data = f.read()
rows = data.split('\n')
for row in rows:
    split_row = row.split(",")
    weather_data.append(split_row)

Upvotes: 2

Views: 350

Answers (3)

riker
riker

Reputation: 46

I'm ignoring the CSV stuff and concentrating just on your list misunderstanding. When you split the row of text, it becomes a list of strings. That is, rows becomes: ["day1, sunny","day2, rain"].

The for statement, applied to a list, iterates through the elements of that list. So, on the first time through row will be "day1, sunny", the second time through it will be "day2, rain", etc.

Inside each iteration of the for loop, it creates a new list, by splitting row at the commas into, eg, ["day1"," sunny"]. All of these lists are added to the weather_data list you created at the start. You end up with a list of lists, ie [['day1', ' sunny'], ['day2', ' rain']]. If you wanted ['day1', ' sunny', 'day2', ' rain'], you could do:

for row in rows:
     split_row = row.split(",")
     for ele in split_row:
         weather_data.append(ele)

Upvotes: 3

Daniel Roseman
Daniel Roseman

Reputation: 600059

That code does make it a list of lists.

As you say, the first split converts the data into a list, one element per line.

Then, for each line, the second split converts it into another list, one element per column.

And then the second list is appended, as a single item, to the weather_data list - which is now, as the instructions say, a list of lists.

Note that this code isn't very good - quite apart from the fact that you would always use the csv module, as others have pointed out, you would never do f.read() and then split the result. You would just do for line in f which automatically iterates over each row.

Upvotes: 1

Kasravnd
Kasravnd

Reputation: 107357

As a more pythonic and flexible way for dealing with csv files you can use csv module, instead of reading it as a raw text:

import csv
with open("la_weather.csv", 'rb') as f:
  spamreader = csv.reader(f,delimiter=',')
  for row in spamreader:
      #do stuff

Here spamreader is a reader object and you can get the rows as tuple with looping over it.

And if you want to get all of rows within a list you can just convert the spamreader to list :

with open("la_weather.csv", 'rb') as f:
  spamreader = csv.reader(f,delimiter=',')
  print list(spamreader)

Upvotes: 0

Related Questions