ChipsAhoy
ChipsAhoy

Reputation: 1

Import CSV and create one list for each column in Python

I am processing a CSV file in python thats delimited by a comma (,).

Each column is a sampled parameter, for instance column 0 is time, sampled at once a second, column 1 is altitude sampled at 4 times a second, etc.

So columns will look like as below:

Column 0 -> ["Time", 0, " "," "," ",1] 
Column 1 -> ["Altitude", 100, 200, 300, 400]

I am trying to create a list for each column that captures its name and all its data. That way I can do calculations and organize my data into a new file automatically (the sampled data I am working with has substantial number of rows)

I want to do this for any file not just one, so the number of columns can vary.

Normally if every file was consistent I would do something like:

import csv
time =[]
alt = []
dct = {}
with open('test.csv',"r") as csvfile:
    csv_f = csv.reader(csvfile)
    for row in csv_f:
        header.append(row[0])
        alt.append(row[1]) #etc for all columns

I am pretty new in python. Is this a good way to tackle this, if not what is better methodology?

Thanks for your time

Upvotes: 0

Views: 3823

Answers (2)

ChipsAhoy
ChipsAhoy

Reputation: 1

I think I made my problem more simpler and just focused on one column.

What I ultimately wanted to do was to interpolate to the highest sampling rate. So here is what I came up with... Please let me know if I can do anything more efficient. I used A LOT of searching on this site to help build this. Again I am new at Python (about 2-3 weeks but some former programming experience)

import csv
header = []
#initialize variables
loc_int = 0
loc_fin = 0
temp_i = 0
temp_f = 0

with open('test2.csv',"r") as csvfile: # open csv file
    csv_f = csv.reader(csvfile)
    for row in csv_f:
        header.append(row[0]) #make a list that consists of all content in column A

for x in range(0,len(header)-1): #go through entire column

    if header[x].isdigit() and header[x+1]=="": # find lower bound of sample to be interpolated
        loc_int = x
        temp_i = int(header[x])

    elif header[x+1].isdigit() and header[x]=="": # find upper bound of sample to be interpolated

        loc_fin = x
        temp_f = int(header[x+1])

    if temp_f>temp_i: #calculate interpolated values
        f_min_i = temp_f - temp_i
        interp = f_min_i/float((loc_fin+1)-loc_int)

        for y in range(loc_int, loc_fin+1):
            header[y] = temp_i + interp*(y-loc_int)

print header

with open("output.csv", 'wb') as g: #write to new file
    writer = csv.writer(g)
    for item in header:
        writer.writerow([item])

I couldn't figure out how to write my new list "header" with its interpolated values and replace it with column A of my old file , test2.csv.

Anywho thank you very much for looking...

Upvotes: 0

Yarnspinner
Yarnspinner

Reputation: 882

Pandas will probably work best for you. If you use csv_read from pandas, it will create a DataFrame based on the column. It's roughly a dictionary of lists.

You can also use the .tolist() functionality of pandas to convert it to a list if you want a list specifically.

import pandas as pd
data = pd.read_csv("soqn.csv")
dict_of_lists = {}

for column_name in data.columns:
    temp_list = data[column_name].tolist()
    dict_of_lists[column_name] = temp_list

print dict_of_lists

EDIT: 
dict_of_lists={column_name: data[column_name].tolist() for column_name in data.columns}
#This list comprehension might work faster.

Upvotes: 1

Related Questions