Reputation: 625

Running python code on multiple files in folder and writing them to separate files

I am working on a code to run a script on multiple files in a folder. I am able to run the code on each file however it is only saving to one output file then rewriting over that file. How can I get this code to save the output to separate files? Preferably with a similar name to each original file. This is what I have thus far.

import os, re
import pandas as pd
directory = os.listdir('C:/Users/user/Desktop/NOV')
os.chdir('C:/Users/user/Desktop/NOV')

for file in directory:
    df = pd.read_csv(file, index_col="DateTime", parse_dates=True)
    df = df.resample('1min').mean()
    df = df.reindex(pd.date_range(df.index.min(), df.index.max(), freq="1min"))
    df.to_csv("newfile.csv", na_rep='NaN')

Upvotes: 0

Answers (4)

Jean-François Fabre

Reputation: 140307

My approach:

use glob.glob instead of os.listdir to filter out files which aren't csv files
don't perform a os.chdir, this is bad practice because other modules may not be aware that you changed the current directory, also changing dir twice as relative will fail, using glob.glob is nice to avoid that.
create a file with the same name but with "new_" prefix in the same directory (running twice will create "new_new_ file, though)

code:

import os, re, glob
import pandas as pd

input_dir = 'C:/Users/user/Desktop/NOV'

for file in glob.glob(os.path.join(input_dir,"*.csv")):
    df = pd.read_csv(file, index_col="DateTime", parse_dates=True)
    df = df.resample('1min').mean()
    df = df.reindex(pd.date_range(df.index.min(), df.index.max(), freq="1min"))
    new_filename = os.path.join(input_dir,"new_"+os.path_basename(file))
    df.to_csv(new_filename, na_rep='NaN')

Upvotes: 1

s3bw

Reputation: 3049

The 'file' you've referenced in your for-loop should be the string of the file you are manipulating in your directory.

for file in directory:
    print file
    #oldfile.csv

You can use this to make a new file with a reference to the original. Something like this:

for file in directory:
    df.to_csv("Output -" + file, na_rep='NaN') #make this the last line of your for-loop.
    #File will be called 'Output - oldfile.csv'

Upvotes: 0

languitar

Reputation: 6794

Well, it obviously will always write to the same file because you are always giving the same file name in to_csv. Use os.path.basename to create a new file name based on the old one without extension:

df.to_csv(os.path.basename(file) + "-processed.csv", na_rep='NaN')

Upvotes: 1

czr

Reputation: 658

Just change the file name in the last line in each iteration of the loop. Something like for i, file in enumerate(directory): and then df.to_csv("new_" + file + ".csv", na_rep='NaN') will do.

Upvotes: 1

Running python code on multiple files in folder and writing them to separate files

Answers (4)

Related Questions