Charles Parr
Charles Parr

Reputation: 593

How to create separate Pandas DataFrames for each CSV file and give them meaningful names?

I've searched thoroughly and can't quite find the guidance I am looking for on this issue so I hope this question is not redundant. I have several .csv files that represent raster images. I'd like to perform some statistical analysis on them so I am trying to create a Pandas dataframe for each file so I can slice 'em dice 'em and plot 'em...but I am having trouble looping through the list of files to create a DF with a meaningful name for each file.

Here is what I have so far:

import glob
import os
from pandas import *

#list of .csv files
#I'd like to turn each file into a dataframe
dataList = glob.glob(r'C:\Users\Charlie\Desktop\Qvik\textRasters\*.csv')

#name that I'd like to use for each data frame
nameList = []
for raster in dataList:
    path_list = raster.split(os.sep)
    name = path_list[6][:-4]
    nameList.append(name)

#zip these lists into a dict

dataDct = {}
for k, v in zip(nameList,dataList):
    dataDct[k] = dataDct.get(k,"") + v
dataDct

So now I have a dict where the key is the name I want for each dataframe and the value is the path for read_csv(path):

{'Aspect': 'C:\\Users\\Charlie\\Desktop\\Qvik\\textRasters\\Aspect.csv',
 'Curvature': 'C:\\Users\\Charlie\\Desktop\\Qvik\\textRasters\\Curvature.csv',
 'NormalZ': 'C:\\Users\\Charlie\\Desktop\\Qvik\\textRasters\\NormalZ.csv',
 'Slope': 'C:\\Users\\Charlie\\Desktop\\Qvik\\textRasters\\Slope.csv',
 'SnowDepth': 'C:\\Users\\Charlie\\Desktop\\Qvik\\textRasters\\SnowDepth.csv',
 'Vegetation': 'C:\\Users\\Charlie\\Desktop\\Qvik\\textRasters\\Vegetation.csv',
 'Z': 'C:\\Users\\Charlie\\Desktop\\Qvik\\textRasters\\Z.csv'}

My instinct was to try variations of this:

for k, v in dataDct.iteritems():
    k = read_csv(v)

but that leaves me with a single dataframe, 'k' , that is filled with data from the last file read in by the loop.

I'm probably missing something fundamental here but I am starting to spin my wheels on this so I'd thought I'd ask y'all...any ideas are appreciated!

Cheers.

Upvotes: 5

Views: 1998

Answers (2)

EdChum
EdChum

Reputation: 394469

Unclear why you're overwriting your object here I think you want either a list or dict of the dfs:

df_list=[]
for k, v in dataDct.iteritems():
    df_list.append(read_csv(v))

or

df_dict={}
for k, v in dataDct.iteritems():
    df_dict[k] = read_csv(v)

Upvotes: 1

Donald Miner
Donald Miner

Reputation: 39943

Are you trying to get all of the data frames separately in a dictionary, one data frame per key? If so, this will leave you with the dict you showed but instead will have the data from in each key.

dataDct = {}
for k, v in zip(nameList,dataList):
    dataDct[k] = read_csv(v)

So now, you could do this for example:

dataDct['SnowDepth'][['cola','colb']].plot()

Upvotes: 3

Related Questions