Reputation: 247
I have ~200 files and would like to grab the data in each file, then to show all the data in one .csv
file.
For example, the file list is
#targeted folder
a001.opd
a002.opd
a003.opd
...
..
.
a200.opd
Each file has the same data structure which looks like
model <many spaces> 1 <many spaces> 0.003 0.002 # Title
mo(data,1) <many spaces> 1 <many spaces> 0.2 0.0001 # 1
mo(data,1) <many spaces> 2 <many spaces> -0.1 0.04 # 2
mo(data,1) <many spaces> 3 <many spaces> -0.4 0.005 # 3
....................................
................
............. # n-1
...... # n
If I would like to see something in my grab_result.csv
file, does anyone how to achieve this by python.
#grab_result.csv # order will be from left to right that is a001 to a200
a001 a002
model 1 0.003 0.002 <empty column> model 1 0.02 0.1 <empty column>
mo(data,1) 1 0.2 0.0001 <empty column> mo(data,1) 1 0.04 0.001 <empty column>
mo(data,1) 2 -0.1 0.04 <empty column> mo(data,1) 2 -0.145 0.014 <empty column>
mo(data,1) 3 -0.2 0.003 <empty column> mo(data,1) 3 -0.24 0.06 <empty column>
Below is the code I have done.
import os
def openfolder(path, outputfile='grab_result.csv'): # get .opd file from folder and open an output file
if os.path.isdir(path):
fo = open(outputfile, 'wb')
fo.write('filename') # write title here
for filename in [os.path.abspath(path)+'\\'+each for each in os.listdir(path) if each.endswith('.opd')]:
return openfile(filename)
else:
print "path unavailable"
openfolder('C:\\path', 'C:\\path\\grab_result.csv')
def openfile(filename): # open file.opd
if os.path.isfile(filename) and filename.endswith('.opd'):
return grabdata(open(filename, 'rb').read())
else:
print "invalid file"
return []
def grabdata(string): # start to grab data
ret = []
idx_data = string.find('model')
# then I stop here....
Does anyone know how to grab the data from these files?
Here is my example file ( http://goo.gl/HyT0wM )
Upvotes: 1
Views: 953
Reputation: 109546
This is not an answer, just an extended comment.
Why do you want the results from left to right (200 x 5 columns wide)? Wouldn't it provide more flexibility for adding additional columns later if you were to transpose your data? For example:
a001 model mo(data,1) mo(data,1) mo(data,1)
1 1 2 3
0.003 0.2 -0.1 -0.2
0.002 0.0001 0.04 0.003
a002 model mo(data,1) mo(data,1) mo(data,1)
1 1 2 3
...
The difficulty in having it 200 x 5 columns wide is that you need to pad columns. If a file were missing information, then it could throw off your entire structure. You also need to write each row consisting of a single slice from all 200 files.
Upvotes: 1
Reputation: 13549
It would be something like this:
def grabdata(filename):
# start to grab data
matches = []
with open(filename) as f:
for line in f:
# add if matches:
if line.startswith("model"): # or: line.find("model") != -1
matches.append(line.strip())
return matches
Upvotes: 1
Reputation: 15433
If you have many files with lots of content, I would use generators. That allows not to load all the contents into memory. Here is how I would go about it:
def get_all_files(path):
## get a generator with all file names
import os
import glob
return glob.iglob(os.path.join(path,'*.opd'))
def get_all_data(files):
## get a generator with all the data from all the files
for fil in files:
with open(fil, 'r') as the_file:
for line in the_file:
yield line
def write_lines_to_file(lines, outfile):
with open(outfile, 'w') as the_file:
for line in lines:
## add here an if statement if not all lines should be written to outfile
the_file.write(line+'\n')
path = 'blah blah'
outfile = 'blah.csv'
files = get_all_files(path)
lines = get_all_data(files)
write_lines_to_file(lines, outfile)
Upvotes: 2