matsci05
matsci05

Reputation: 1

Python data manipulation for unusual data format

I've been trying to figure out how to manipulate this slightly unusually formatted data into a plottable format using just python (I'd been doing in with a shell script using sed and the like, but I want to be doing all my scripting in python longterm as that's what I usually use).

My data looks like this:

# Title of File
# step number_of_slices total_a
# slice Coord N v
51000 5 240000
  1 0.025 12003 0.0255628 
  2 0.075 11991 0.0257368
  3 0.125 11989 0.0258158
  4 0.175 11997.2 0.0259262
  5 0.225 11995.8 0.0258637
52000 5 240000
  1 0.025 12004.7 0.0251662
  2 0.075 11998.7 0.0256496
  3 0.125 11996.3 0.025816
  4 0.175 11994 0.0259593
  5 0.225 12008.3 0.0258245
  .
  .
  .
1010000 5 240000                                                                                                   
  1 0.025 12304.6 0.0182998                                                                                                     
  2 0.075 12146.1 0.0195533                                                                                         
  3 0.125 12026.9 0.0211158                                                                                         
  4 0.175 12003.5 0.0228836                                                                                         
  5 0.225 12000.3 0.0242854

And I want the data from just the fourth column appended to a single file for each 'step', i.e.

Steps 51000 52000 ... 1010000
1 0.0255628 0.0251662 ... 0.0182998
2 0.0257368 0.025816 ... 0.0195533
3 0.0258158 0.0259593 ... 0.0211158
4 0.0259262 0.0258245 ... 0.0228836
5 0.0258637 0.0258245 ... 0.0242854

In bash this was pretty easy. I cut the fourth column of every 6 lines and appended to a new file. But I can't for the life of me figure out how to do this with just python.

This is the best I got:

import csv

f = open('file.dat')
csv_f = csv.reader(f, delimiter = " ")

column = []

for row in csv_f:
        column.append(row[5])
print column

f.close()

The 5 is because I end up with the first two columns empty (I guess that's a formatting thing) but because some rows have only 3 elements this gives me an error so I can't even isolate the column in order to begin to get the format I want:

['51000', '20', '240000']
['', '', '1', '0.025', '12003', '0.0255628']
['', '', '2', '0.075', '11991', '0.0257368']
['', '', '3', '0.125', '11989', '0.0258158']
['', '', '4', '0.175', '11997.2', '0.0259262']
['', '', '5', '0.225', '11995.8', '0.0258637']

Traceback (most recent call last): File "open.py", line 13, in column.append(row[5]) IndexError: list index out of range

At this point, I think I've overcomplicated it, and any solution I come up with will be quite convoluted instead of streamlining my workflow as intended. What's the 'correct' way? Please and thank you

Upvotes: 0

Views: 78

Answers (1)

Prune
Prune

Reputation: 77837

Simply "chunk" your input in packets of 6 lines. File the data in parallel lists. Don't even bother with the CSV reader; you don't need the structure.

step = []
value = [[] for _ in range(5)]  # initialize 5 value lists

with open('file.dat') as f:
    for _ in range(num_of_header_lines):
        f.readline()

    while # not f.EOF
        # extract step: first value on the line
        step.append(int(f.readline().split()[0]))
        for phase in range(5):
            # Extract the last value for the appropriate phase list
            value[phase].append(float(f.readline().split()[-1]))

I've left the file initialization and EOF details for you -- this is the internal logic. You grab a line and append the step number to the step list. Then you read five more lines, grabbing the last value off each line for its corresponding sub-list.

Upvotes: 1

Related Questions