jason adams
jason adams

Reputation: 565

Python CSV file manipulation

Here's a small excerpt of a CSV file that I'm trying to manipulate, each line in the CSV is a string:

"address,bathrooms,bedrooms,built,lot,saledate,sale price,squarefeet"
"1116 Fountain St, Ann Arbor, MI Real Estate",2,4,1949,0.62 ac,20140905,469900,"1,910"
"3277 Chamberlain Cir, Ann Arbor, MI Real Estate",3,3,2002,0.32 ac,20140905,315000,"1,401"
"2889 Walnut Ridge Dr, Ann Arbor, MI Real Estate",4,4,2005,0.50 ac,20140904,790000,"3,972"
"1336 Nottington Ct, Ann Arbor, MI Real Estate",3,3,2002,,20140904,332350,"1,521"
"344 Sedgewood Ln # 14, Ann Arbor, MI Real Estate",,,,"6,534",20140904,345000,
"545 Allison Dr, Ann Arbor, MI Real Estate",2,2,,0.29 ac,20140904,159900,"1,400"

I would like to make each line a list, separated like so:

["1116 Fountain St, Ann Arbor, MI Real Estate", 2, 4, 1949, 0.62, 20140905, 469900, 1910]

I would like for the first item (address) to be a string and the rest to be ints and floats. The reason why I bolded the 0.62 is because I want to be able to replace 0.62ac with 0.62. I tried splitting each line, but doing line.split(',') won't work because the address contains two commas in it, and I'd be splitting that as well. Is there a simpler way to do this?

I'd appreciate any suggestions.

Thanks.

Upvotes: 0

Views: 168

Answers (1)

mhawke
mhawke

Reputation: 87134

First of all, use the csv module. It will handle the quoted fields for you and won't break the field up if it contains embedded commas.

import csv

with open('input.csv') as f:
    reader = csv.reader(f)
    next(reader)   # thow away the header
    for row in reader:
        print row

Produces

['1116 Fountain St, Ann Arbor, MI Real Estate', '2', '4', '1949', '0.62 ac', '20140905', '469900', '1,910']
['3277 Chamberlain Cir, Ann Arbor, MI Real Estate', '3', '3', '2002', '0.32 ac', '20140905', '315000', '1,401']
['2889 Walnut Ridge Dr, Ann Arbor, MI Real Estate', '4', '4', '2005', '0.50 ac', '20140904', '790000', '3,972']
['1336 Nottington Ct, Ann Arbor, MI Real Estate', '3', '3', '2002', '', '20140904', '332350', '1,521']
['344 Sedgewood Ln # 14, Ann Arbor, MI Real Estate', '', '', '', '6,534', '20140904', '345000', '']
['545 Allison Dr, Ann Arbor, MI Real Estate', '2', '2', '', '0.29 ac', '20140904', '159900', '1,400']

So you can see that the CSV reader handles the fields properly. Next you need to convert the fields to ints and floats as appropriate.

Upvotes: 2

Related Questions