Reputation: 625
I am working on a code to run a script on multiple files in a folder. I am able to run the code on each file however it is only saving to one output file then rewriting over that file. How can I get this code to save the output to separate files? Preferably with a similar name to each original file. This is what I have thus far.
import os, re
import pandas as pd
directory = os.listdir('C:/Users/user/Desktop/NOV')
os.chdir('C:/Users/user/Desktop/NOV')
for file in directory:
df = pd.read_csv(file, index_col="DateTime", parse_dates=True)
df = df.resample('1min').mean()
df = df.reindex(pd.date_range(df.index.min(), df.index.max(), freq="1min"))
df.to_csv("newfile.csv", na_rep='NaN')
Upvotes: 0
Views: 3116
Reputation: 140307
My approach:
glob.glob
instead of os.listdir
to filter out files which aren't csv
filesos.chdir
, this is bad practice because other modules may not be aware that you changed the current directory, also changing dir twice as relative will fail, using glob.glob
is nice to avoid that."new_"
prefix in the same directory (running twice will create "new_new_
file, though)code:
import os, re, glob
import pandas as pd
input_dir = 'C:/Users/user/Desktop/NOV'
for file in glob.glob(os.path.join(input_dir,"*.csv")):
df = pd.read_csv(file, index_col="DateTime", parse_dates=True)
df = df.resample('1min').mean()
df = df.reindex(pd.date_range(df.index.min(), df.index.max(), freq="1min"))
new_filename = os.path.join(input_dir,"new_"+os.path_basename(file))
df.to_csv(new_filename, na_rep='NaN')
Upvotes: 1
Reputation: 3049
The 'file' you've referenced in your for-loop should be the string of the file you are manipulating in your directory.
for file in directory:
print file
#oldfile.csv
You can use this to make a new file with a reference to the original. Something like this:
for file in directory:
df.to_csv("Output -" + file, na_rep='NaN') #make this the last line of your for-loop.
#File will be called 'Output - oldfile.csv'
Upvotes: 0
Reputation: 6794
Well, it obviously will always write to the same file because you are always giving the same file name in to_csv
. Use os.path.basename
to create a new file name based on the old one without extension:
df.to_csv(os.path.basename(file) + "-processed.csv", na_rep='NaN')
Upvotes: 1
Reputation: 658
Just change the file name in the last line in each iteration of the loop.
Something like for i, file in enumerate(directory):
and then df.to_csv("new_" + file + ".csv", na_rep='NaN')
will do.
Upvotes: 1