Sarah Carroll
Sarah Carroll

Reputation: 1

Python - processing all files in a specific folder

I'm somewhat newish to python (which is the only programming language I know), and I've got a bunch of spectral data saved as .txt files, where each row is a data point, the first number being the wavelength of light used and separated by a tab, the second number is the instrument signal/response to that wavelength of light.

I want to be able to take all the data files I have in a folder, and print a file that's an an average of all the signal/response column entries for each wavelength of light (they all contain data for responses from 350-2500nm light). Is there any way to do this? If it weren't for the fact that I need to average together 103 spectra, I'd just do it by hand, but....

EDIT: I realize I worded this terribly. I now realize I can probably just use os to access all the files in a given folder. The thing is that I want to average the signal values for each wavelength. Ie, I want to read all the data from the folder and get an average value for the signal/response at 350nm, 351nm, etc. I'm thinking this something I could do with a loop once i get all the files read into python, but I'm not 100% sure. I'm also hesitant because I'm worried that will slow down the program a lot.

Upvotes: 0

Views: 2727

Answers (3)

Peter Cordes
Peter Cordes

Reputation: 364287

If you're on anything but Windows, a common way to do this would be to write a python program that handles all the files you put on the command line. Then you can run it on results/* to process everything, or just on a single file, or just on a few files.

This would be the more Unixy way to go about things. There are many unix programs that can handle multiple input files (cat, sort, awk, etc.), but most of them leave the directory traversal to the shell.

http://www.diveintopython.net/scripts_and_streams/command_line_arguments.html has some examples of getting at the command line args for your program.

import sys

for arg in sys.argv[1:]:  # argv[0] is the script's name; skip it
    # print arg
    sum_file(arg)  # or put the code inline here, so you don't need global variables to keep state between calls.

print "totals ..."

See also this question: What is "argv", and what does it do?

Upvotes: 0

Dalen
Dalen

Reputation: 4236

import os

dir = "./" # Your directory

lengths   = 0
responses = 0
total     = 0

for x in os.listdir(dir):
    # Check if x has *.txt extension.
    if os.path.splitext(x)[1]!=".txt": continue
    fullname = os.path.join(dir, x)
    # We don't want directories ending with *.txt to mess up our program (although in your case this is very unlikely)
    if os.path.isdir(fullpath): continue
    # Now open and read the file as binary
    file = open(fullname, "rb")
    content = file.read()
    file.close()
    # Take two entries:
    content = content.split()
    l = float(content[0])
    r = float(content[1])
    lengths += l; responses += r
    total += 1

print "Avg of lengths:", lengths/total
print "Avg of responses:", responses/total

If you want it to enter the subdirectories put it into function and make it recurse when os.path.isdir(fullname) is True.

Although I wrote you the code, SO is not for that. Mind that in your next question.

Upvotes: 0

dermen
dermen

Reputation: 5372

Something like this (assuming all your txt files are formatted the same, and that all files have the same range of wavelength values )

import os

import numpy as np


dat_dir   = '/my/dat/dir'
fnames    = [ os.path.join(x,dat_dir) for x in os.listdir(dat_dir) if x.endswith('.txt') ]

data      = [ np.loadtxt( f) for f in fnames ]
xvals     = data[0][:,0] #wavelengths, should be the same in each file
yvals     = [ d[:,1] for d in data ] #measurement

y_mean    = np.mean(yvals, axis=0 ) 

np.savetxt( 'spectral_ave.txt', zip(xvals, y_mean) , fmt='%.4f') # something like that

Upvotes: 1

Related Questions