Reputation: 295
I'm trying to understand/visualise the process of parsing a raw csv data file in Python
from dataquest.io
's training course.
I understand that rows = data.split('\n')
splits the long string of csv file into rows based on where the line break is. ie:
day1, sunny, \n day2, rain \n
becomes
day1, sunny
day2, rain
I thought the for
loop would further break the data into something like:
day 1
sunny
day 2
rain
Instead the course seems to imply it would actually become a list of lists usefully. I don't understand, why does that happen?
weather_data = []
f = open("la_weather.csv", 'r')
data = f.read()
rows = data.split('\n')
for row in rows:
split_row = row.split(",")
weather_data.append(split_row)
Upvotes: 2
Views: 350
Reputation: 46
I'm ignoring the CSV stuff and concentrating just on your list misunderstanding. When you split the row of text, it becomes a list of strings. That is, rows
becomes: ["day1, sunny","day2, rain"]
.
The for
statement, applied to a list, iterates through the elements of that list. So, on the first time through row
will be "day1, sunny"
, the second time through it will be "day2, rain"
, etc.
Inside each iteration of the for loop, it creates a new list, by splitting row at the commas into, eg, ["day1"," sunny"]
. All of these lists are added to the weather_data list you created at the start. You end up with a list of lists, ie [['day1', ' sunny'], ['day2', ' rain']]
. If you wanted ['day1', ' sunny', 'day2', ' rain']
, you could do:
for row in rows:
split_row = row.split(",")
for ele in split_row:
weather_data.append(ele)
Upvotes: 3
Reputation: 600059
That code does make it a list of lists.
As you say, the first split
converts the data into a list, one element per line.
Then, for each line, the second split
converts it into another list, one element per column.
And then the second list is appended, as a single item, to the weather_data
list - which is now, as the instructions say, a list of lists.
Note that this code isn't very good - quite apart from the fact that you would always use the csv
module, as others have pointed out, you would never do f.read()
and then split the result. You would just do for line in f
which automatically iterates over each row.
Upvotes: 1
Reputation: 107357
As a more pythonic and flexible way for dealing with csv
files you can use csv
module, instead of reading it as a raw text:
import csv
with open("la_weather.csv", 'rb') as f:
spamreader = csv.reader(f,delimiter=',')
for row in spamreader:
#do stuff
Here spamreader
is a reader object and you can get the rows as tuple with looping over it.
And if you want to get all of rows within a list you can just convert the spamreader
to list :
with open("la_weather.csv", 'rb') as f:
spamreader = csv.reader(f,delimiter=',')
print list(spamreader)
Upvotes: 0