TomCrow
TomCrow

Reputation: 47

Pandas - Reading CSVs to dataframes in a FOR loop then appending to a master DF is returning a blank DF

I've searched for about an hour for an answer to this and none of the solutions I've found are working. I'm trying to get a folder full of CSVs into a single dataframe, to output to one big csv. Here's my current code:

import os

sourceLoc = "SOURCE"
destLoc = sourceLoc + "MasterData.csv"
masterDF = pd.DataFrame([])

for file in os.listdir(sourceLoc):
        workingDF = pd.read_csv(sourceLoc + file)
        print(workingDF)
        masterDF.append(workingDF)
        
print(masterDF)

The SOURCE is a folder path but I've had to remove it as it's a work network path. The loop is reading the CSVs to the workingDF variable as when I run it it prints the data into the console, but it's also finding 349 rows for each file. None of them have that many rows of data in them.

When I print masterDF it prints Empty DataFrame Columns: [] Index: []

My code is from this solution but that example is using xlsx files and I'm not sure what changes, if any, are needed to get it to work with CSVs. The Pandas documentation on .append and read_csv is quite limited and doesn't indicate anything specific I'm doing wrong.

Any help would be appreciated.

Upvotes: 1

Views: 668

Answers (3)

Adoni5
Adoni5

Reputation: 61

There are a couple of things wrong with your code, but the main thing is that pd.append returns a new dataframe, instead of modifying in place. So you would have to do:

masterDF = masterDF.append(workingDF)

I also like the approach taken by I_Al-thamary - concat will probably be faster.

One last thing I would suggest, is instead of using glob, check out pathlib.

import pandas as pd
from pathlib import Path
path = Path("your path")
df = pd.concat(map(pd.read_csv, path.rglob("*.csv"))))
     

Upvotes: 1

I_Al-thamary
I_Al-thamary

Reputation: 4043

you can use glob

import glob
import pandas as pd
import os
path = "your path"
df = pd.concat(map(pd.read_csv, glob.glob(os.path.join(path,'*.csv'))))
print(df)

Upvotes: 1

Raymond Kwok
Raymond Kwok

Reputation: 2541

You may store them all in a list and pd.concat them at last.

dfs = [
    pd.read_csv(os.path.join(sourceLoc, file)) 
        for file in os.listdir(sourceLoc)
]

masterDF = pd.concat(df)

Upvotes: 0

Related Questions