Reputation: 565
Here's a small excerpt of a CSV file that I'm trying to manipulate, each line in the CSV is a string:
"address,bathrooms,bedrooms,built,lot,saledate,sale price,squarefeet"
"1116 Fountain St, Ann Arbor, MI Real Estate",2,4,1949,0.62 ac,20140905,469900,"1,910"
"3277 Chamberlain Cir, Ann Arbor, MI Real Estate",3,3,2002,0.32 ac,20140905,315000,"1,401"
"2889 Walnut Ridge Dr, Ann Arbor, MI Real Estate",4,4,2005,0.50 ac,20140904,790000,"3,972"
"1336 Nottington Ct, Ann Arbor, MI Real Estate",3,3,2002,,20140904,332350,"1,521"
"344 Sedgewood Ln # 14, Ann Arbor, MI Real Estate",,,,"6,534",20140904,345000,
"545 Allison Dr, Ann Arbor, MI Real Estate",2,2,,0.29 ac,20140904,159900,"1,400"
I would like to make each line a list, separated like so:
["1116 Fountain St, Ann Arbor, MI Real Estate", 2, 4, 1949, 0.62, 20140905, 469900, 1910]
I would like for the first item (address) to be a string and the rest to be ints and floats. The reason why I bolded the 0.62 is because I want to be able to replace 0.62ac with 0.62. I tried splitting each line, but doing line.split(',') won't work because the address contains two commas in it, and I'd be splitting that as well. Is there a simpler way to do this?
I'd appreciate any suggestions.
Thanks.
Upvotes: 0
Views: 168
Reputation: 87134
First of all, use the csv module. It will handle the quoted fields for you and won't break the field up if it contains embedded commas.
import csv
with open('input.csv') as f:
reader = csv.reader(f)
next(reader) # thow away the header
for row in reader:
print row
Produces
['1116 Fountain St, Ann Arbor, MI Real Estate', '2', '4', '1949', '0.62 ac', '20140905', '469900', '1,910']
['3277 Chamberlain Cir, Ann Arbor, MI Real Estate', '3', '3', '2002', '0.32 ac', '20140905', '315000', '1,401']
['2889 Walnut Ridge Dr, Ann Arbor, MI Real Estate', '4', '4', '2005', '0.50 ac', '20140904', '790000', '3,972']
['1336 Nottington Ct, Ann Arbor, MI Real Estate', '3', '3', '2002', '', '20140904', '332350', '1,521']
['344 Sedgewood Ln # 14, Ann Arbor, MI Real Estate', '', '', '', '6,534', '20140904', '345000', '']
['545 Allison Dr, Ann Arbor, MI Real Estate', '2', '2', '', '0.29 ac', '20140904', '159900', '1,400']
So you can see that the CSV reader handles the fields properly. Next you need to convert the fields to ints and floats as appropriate.
Upvotes: 2