Casper
Casper

Reputation: 73

Use for loop to create dataframes from a list

Python/Pandas beginner here. I have a list with names which each represent a csv file on my computer. I would like to create a separate pandas dataframe for each of these csv files and use the same names for the dataframes. I can do this in a very inefficient way by creating a separate line of code for each name in the list and adding/removing these lines of code manually as the list changes over time, something like this when I have 3 names Mark, Frank and Peter:

path = 'C:\\Users\\Me\\Desktop\\Names'

Mark = pd.read_csv(path+"Mark.csv")
Frank = pd.read_csv(path+"Frank.csv")
Peter = pd.read_csv(path+"Peter.csv")

Problem is that I will usually have a dozen or so names and they change frequently, so this is not very efficient. Instead I figured I would keep a list of the names to update when needed and use a for loop to do the rest:

path = 'C:\\Users\\Me\\Desktop\\Names'
names = ['Mark','Frank','Peter']

for name in names:
    name = pd.read_csv(path+name+'.csv')

This does not produce an error, but instead of creating 3 different dataframes Mark, Frank and Peter, it creates a single dataframe 'name' using only the data from the first entry in the list. How do make this work so that it creates a separate dataframe for each name in the list and give each dataframe the same name as the csv file that was read?

Upvotes: 2

Views: 2019

Answers (4)

JOHN DeVoe
JOHN DeVoe

Reputation: 1


for name in names:  
    globals()[name] = pd.read_csv(path+name+'.csv')

Upvotes: 0

Diptangsu Goswami
Diptangsu Goswami

Reputation: 5975

name here is the variable used to iterate over the list. Modifying it won't make any noticable changes.

path = 'C:\\Users\\Me\\Desktop\\Names'
names = ['Mark','Frank','Peter']
dfs = []

for name in names:
    dfs.append(pd.read_csv(path + name + '.csv'))

# OR
dfs = [
    pd.read_csv(path + name + '.csv')
    for name in names
]

Or, you can use a dict to map the name with the file.

path = 'C:\\Users\\Me\\Desktop\\Names'
names = ['Mark','Frank','Peter']
dfs = {}

for name in names:
    dfs[name] = pd.read_csv(path + name + '.csv')

# OR
dfs = {
    name : pd.read(path + name + '.csv')
    for name in names
}

Upvotes: 3

Daniel Nudelman
Daniel Nudelman

Reputation: 401

Two options: If you know the names of all your csv files you can edit you code and only add a list to hold all your files. Example

path = 'C:\\Users\\Me\\Desktop\\Names'
names = ['Mark','Frank','Peter']
dfs = []

for name in names:
    dfs.append(pd.read_csv(path+name+'.csv')

Otherwise, you can look for all the files with csv extension and open all of them using listdir()

import os
import pandas as pd
path = 'C:\\Users\\Me\\Desktop\\Names'
files = os.listdir(path)
dfs = []
for file in files:
    if file[-3:] == "csv":
        dfs.append(pf.read_csv(path + file))

Upvotes: 2

Karl Knechtel
Karl Knechtel

Reputation: 61643

it creates a single dataframe 'name' using only the data from the first entry in the list.

It uses the last entry, because each time through the loop, name is replaced with the result of the next read_csv call. (Actually, it's being replaced with one of the value from the list, and then with the read_csv result; to avoid confusion, you should use a separate name for your loop variables as your outputs. Especially since name doesn't make any sense as the thing to call your result :) )

How do make this work

You had a list of input values, and thus you want a list of output values as well. The simplest approach is to use a list comprehension, describing the list you want in terms of the list you start with:

csvs = [
    pd.read_csv(f'{path}{name}.csv')
    for name in names
]

It works the same way as the explicit loop, except it builds a list automatically from the value that's computed each time through. It means what it says, in order: "csvs is a list of these pd.read_csv results, computed once for each of the name values that is in names".

Upvotes: 3

Related Questions