Reputation: 1
I've been trying to figure out how to manipulate this slightly unusually formatted data into a plottable format using just python (I'd been doing in with a shell script using sed and the like, but I want to be doing all my scripting in python longterm as that's what I usually use).
My data looks like this:
# Title of File
# step number_of_slices total_a
# slice Coord N v
51000 5 240000
1 0.025 12003 0.0255628
2 0.075 11991 0.0257368
3 0.125 11989 0.0258158
4 0.175 11997.2 0.0259262
5 0.225 11995.8 0.0258637
52000 5 240000
1 0.025 12004.7 0.0251662
2 0.075 11998.7 0.0256496
3 0.125 11996.3 0.025816
4 0.175 11994 0.0259593
5 0.225 12008.3 0.0258245
.
.
.
1010000 5 240000
1 0.025 12304.6 0.0182998
2 0.075 12146.1 0.0195533
3 0.125 12026.9 0.0211158
4 0.175 12003.5 0.0228836
5 0.225 12000.3 0.0242854
And I want the data from just the fourth column appended to a single file for each 'step', i.e.
Steps 51000 52000 ... 1010000
1 0.0255628 0.0251662 ... 0.0182998
2 0.0257368 0.025816 ... 0.0195533
3 0.0258158 0.0259593 ... 0.0211158
4 0.0259262 0.0258245 ... 0.0228836
5 0.0258637 0.0258245 ... 0.0242854
In bash this was pretty easy. I cut the fourth column of every 6 lines and appended to a new file. But I can't for the life of me figure out how to do this with just python.
This is the best I got:
import csv
f = open('file.dat')
csv_f = csv.reader(f, delimiter = " ")
column = []
for row in csv_f:
column.append(row[5])
print column
f.close()
The 5 is because I end up with the first two columns empty (I guess that's a formatting thing) but because some rows have only 3 elements this gives me an error so I can't even isolate the column in order to begin to get the format I want:
['51000', '20', '240000']
['', '', '1', '0.025', '12003', '0.0255628']
['', '', '2', '0.075', '11991', '0.0257368']
['', '', '3', '0.125', '11989', '0.0258158']
['', '', '4', '0.175', '11997.2', '0.0259262']
['', '', '5', '0.225', '11995.8', '0.0258637']
Traceback (most recent call last): File "open.py", line 13, in column.append(row[5]) IndexError: list index out of range
At this point, I think I've overcomplicated it, and any solution I come up with will be quite convoluted instead of streamlining my workflow as intended. What's the 'correct' way? Please and thank you
Upvotes: 0
Views: 78
Reputation: 77837
Simply "chunk" your input in packets of 6 lines. File the data in parallel lists. Don't even bother with the CSV reader; you don't need the structure.
step = []
value = [[] for _ in range(5)] # initialize 5 value lists
with open('file.dat') as f:
for _ in range(num_of_header_lines):
f.readline()
while # not f.EOF
# extract step: first value on the line
step.append(int(f.readline().split()[0]))
for phase in range(5):
# Extract the last value for the appropriate phase list
value[phase].append(float(f.readline().split()[-1]))
I've left the file initialization and EOF details for you -- this is the internal logic. You grab a line and append the step number to the step list. Then you read five more lines, grabbing the last value off each line for its corresponding sub-list.
Upvotes: 1