Reputation: 1
I'm somewhat newish to python (which is the only programming language I know), and I've got a bunch of spectral data saved as .txt files, where each row is a data point, the first number being the wavelength of light used and separated by a tab, the second number is the instrument signal/response to that wavelength of light.
I want to be able to take all the data files I have in a folder, and print a file that's an an average of all the signal/response column entries for each wavelength of light (they all contain data for responses from 350-2500nm light). Is there any way to do this? If it weren't for the fact that I need to average together 103 spectra, I'd just do it by hand, but....
EDIT: I realize I worded this terribly. I now realize I can probably just use os to access all the files in a given folder. The thing is that I want to average the signal values for each wavelength. Ie, I want to read all the data from the folder and get an average value for the signal/response at 350nm, 351nm, etc. I'm thinking this something I could do with a loop once i get all the files read into python, but I'm not 100% sure. I'm also hesitant because I'm worried that will slow down the program a lot.
Upvotes: 0
Views: 2727
Reputation: 364287
If you're on anything but Windows, a common way to do this would be to write a python program that handles all the files you put on the command line. Then you can run it on results/*
to process everything, or just on a single file, or just on a few files.
This would be the more Unixy way to go about things. There are many unix programs that can handle multiple input files (cat
, sort
, awk
, etc.), but most of them leave the directory traversal to the shell.
http://www.diveintopython.net/scripts_and_streams/command_line_arguments.html has some examples of getting at the command line args for your program.
import sys
for arg in sys.argv[1:]: # argv[0] is the script's name; skip it
# print arg
sum_file(arg) # or put the code inline here, so you don't need global variables to keep state between calls.
print "totals ..."
See also this question: What is "argv", and what does it do?
Upvotes: 0
Reputation: 4236
import os
dir = "./" # Your directory
lengths = 0
responses = 0
total = 0
for x in os.listdir(dir):
# Check if x has *.txt extension.
if os.path.splitext(x)[1]!=".txt": continue
fullname = os.path.join(dir, x)
# We don't want directories ending with *.txt to mess up our program (although in your case this is very unlikely)
if os.path.isdir(fullpath): continue
# Now open and read the file as binary
file = open(fullname, "rb")
content = file.read()
file.close()
# Take two entries:
content = content.split()
l = float(content[0])
r = float(content[1])
lengths += l; responses += r
total += 1
print "Avg of lengths:", lengths/total
print "Avg of responses:", responses/total
If you want it to enter the subdirectories put it into function and make it recurse when os.path.isdir(fullname) is True.
Although I wrote you the code, SO is not for that. Mind that in your next question.
Upvotes: 0
Reputation: 5372
Something like this (assuming all your txt files are formatted the same, and that all files have the same range of wavelength values )
import os
import numpy as np
dat_dir = '/my/dat/dir'
fnames = [ os.path.join(x,dat_dir) for x in os.listdir(dat_dir) if x.endswith('.txt') ]
data = [ np.loadtxt( f) for f in fnames ]
xvals = data[0][:,0] #wavelengths, should be the same in each file
yvals = [ d[:,1] for d in data ] #measurement
y_mean = np.mean(yvals, axis=0 )
np.savetxt( 'spectral_ave.txt', zip(xvals, y_mean) , fmt='%.4f') # something like that
Upvotes: 1