Reputation: 593
I've searched thoroughly and can't quite find the guidance I am looking for on this issue so I hope this question is not redundant. I have several .csv files that represent raster images. I'd like to perform some statistical analysis on them so I am trying to create a Pandas dataframe for each file so I can slice 'em dice 'em and plot 'em...but I am having trouble looping through the list of files to create a DF with a meaningful name for each file.
Here is what I have so far:
import glob
import os
from pandas import *
#list of .csv files
#I'd like to turn each file into a dataframe
dataList = glob.glob(r'C:\Users\Charlie\Desktop\Qvik\textRasters\*.csv')
#name that I'd like to use for each data frame
nameList = []
for raster in dataList:
path_list = raster.split(os.sep)
name = path_list[6][:-4]
nameList.append(name)
#zip these lists into a dict
dataDct = {}
for k, v in zip(nameList,dataList):
dataDct[k] = dataDct.get(k,"") + v
dataDct
So now I have a dict where the key is the name I want for each dataframe and the value is the path for read_csv(path):
{'Aspect': 'C:\\Users\\Charlie\\Desktop\\Qvik\\textRasters\\Aspect.csv',
'Curvature': 'C:\\Users\\Charlie\\Desktop\\Qvik\\textRasters\\Curvature.csv',
'NormalZ': 'C:\\Users\\Charlie\\Desktop\\Qvik\\textRasters\\NormalZ.csv',
'Slope': 'C:\\Users\\Charlie\\Desktop\\Qvik\\textRasters\\Slope.csv',
'SnowDepth': 'C:\\Users\\Charlie\\Desktop\\Qvik\\textRasters\\SnowDepth.csv',
'Vegetation': 'C:\\Users\\Charlie\\Desktop\\Qvik\\textRasters\\Vegetation.csv',
'Z': 'C:\\Users\\Charlie\\Desktop\\Qvik\\textRasters\\Z.csv'}
My instinct was to try variations of this:
for k, v in dataDct.iteritems():
k = read_csv(v)
but that leaves me with a single dataframe, 'k' , that is filled with data from the last file read in by the loop.
I'm probably missing something fundamental here but I am starting to spin my wheels on this so I'd thought I'd ask y'all...any ideas are appreciated!
Cheers.
Upvotes: 5
Views: 1998
Reputation: 394469
Unclear why you're overwriting your object here I think you want either a list or dict of the dfs:
df_list=[]
for k, v in dataDct.iteritems():
df_list.append(read_csv(v))
or
df_dict={}
for k, v in dataDct.iteritems():
df_dict[k] = read_csv(v)
Upvotes: 1
Reputation: 39943
Are you trying to get all of the data frames separately in a dictionary, one data frame per key? If so, this will leave you with the dict you showed but instead will have the data from in each key.
dataDct = {}
for k, v in zip(nameList,dataList):
dataDct[k] = read_csv(v)
So now, you could do this for example:
dataDct['SnowDepth'][['cola','colb']].plot()
Upvotes: 3