Reputation: 47
I've searched for about an hour for an answer to this and none of the solutions I've found are working. I'm trying to get a folder full of CSVs into a single dataframe, to output to one big csv. Here's my current code:
import os
sourceLoc = "SOURCE"
destLoc = sourceLoc + "MasterData.csv"
masterDF = pd.DataFrame([])
for file in os.listdir(sourceLoc):
workingDF = pd.read_csv(sourceLoc + file)
print(workingDF)
masterDF.append(workingDF)
print(masterDF)
The SOURCE is a folder path but I've had to remove it as it's a work network path. The loop is reading the CSVs to the workingDF
variable as when I run it it prints the data into the console, but it's also finding 349 rows for each file. None of them have that many rows of data in them.
When I print masterDF
it prints Empty DataFrame Columns: [] Index: []
My code is from this solution but that example is using xlsx files and I'm not sure what changes, if any, are needed to get it to work with CSVs. The Pandas documentation on .append and read_csv is quite limited and doesn't indicate anything specific I'm doing wrong.
Any help would be appreciated.
Upvotes: 1
Views: 668
Reputation: 61
There are a couple of things wrong with your code, but the main thing is that pd.append
returns a new dataframe, instead of modifying in place. So you would have to do:
masterDF = masterDF.append(workingDF)
I also like the approach taken by I_Al-thamary - concat will probably be faster.
One last thing I would suggest, is instead of using glob
, check out pathlib
.
import pandas as pd
from pathlib import Path
path = Path("your path")
df = pd.concat(map(pd.read_csv, path.rglob("*.csv"))))
Upvotes: 1
Reputation: 4043
you can use glob
import glob
import pandas as pd
import os
path = "your path"
df = pd.concat(map(pd.read_csv, glob.glob(os.path.join(path,'*.csv'))))
print(df)
Upvotes: 1
Reputation: 2541
You may store them all in a list and pd.concat
them at last.
dfs = [
pd.read_csv(os.path.join(sourceLoc, file))
for file in os.listdir(sourceLoc)
]
masterDF = pd.concat(df)
Upvotes: 0