Python data manipulation for unusual data format

Question

I've been trying to figure out how to manipulate this slightly unusually formatted data into a plottable format using just python (I'd been doing in with a shell script using sed and the like, but I want to be doing all my scripting in python longterm as that's what I usually use).

My data looks like this:

# Title of File
# step number_of_slices total_a
# slice Coord N v
51000 5 240000
  1 0.025 12003 0.0255628 
  2 0.075 11991 0.0257368
  3 0.125 11989 0.0258158
  4 0.175 11997.2 0.0259262
  5 0.225 11995.8 0.0258637
52000 5 240000
  1 0.025 12004.7 0.0251662
  2 0.075 11998.7 0.0256496
  3 0.125 11996.3 0.025816
  4 0.175 11994 0.0259593
  5 0.225 12008.3 0.0258245
  .
  .
  .
1010000 5 240000                                                                                                   
  1 0.025 12304.6 0.0182998                                                                                                     
  2 0.075 12146.1 0.0195533                                                                                         
  3 0.125 12026.9 0.0211158                                                                                         
  4 0.175 12003.5 0.0228836                                                                                         
  5 0.225 12000.3 0.0242854

And I want the data from just the fourth column appended to a single file for each 'step', i.e.

Steps 51000 52000 ... 1010000
1 0.0255628 0.0251662 ... 0.0182998
2 0.0257368 0.025816 ... 0.0195533
3 0.0258158 0.0259593 ... 0.0211158
4 0.0259262 0.0258245 ... 0.0228836
5 0.0258637 0.0258245 ... 0.0242854

In bash this was pretty easy. I cut the fourth column of every 6 lines and appended to a new file. But I can't for the life of me figure out how to do this with just python.

This is the best I got:

import csv

f = open('file.dat')
csv_f = csv.reader(f, delimiter = " ")

column = []

for row in csv_f:
        column.append(row[5])
print column

f.close()

The 5 is because I end up with the first two columns empty (I guess that's a formatting thing) but because some rows have only 3 elements this gives me an error so I can't even isolate the column in order to begin to get the format I want:

['51000', '20', '240000']
['', '', '1', '0.025', '12003', '0.0255628']
['', '', '2', '0.075', '11991', '0.0257368']
['', '', '3', '0.125', '11989', '0.0258158']
['', '', '4', '0.175', '11997.2', '0.0259262']
['', '', '5', '0.225', '11995.8', '0.0258637']

Traceback (most recent call last): File "open.py", line 13, in column.append(row[5]) IndexError: list index out of range

At this point, I think I've overcomplicated it, and any solution I come up with will be quite convoluted instead of streamlining my workflow as intended. What's the 'correct' way? Please and thank you

Python data manipulation for unusual data format

Answers (1)

Related Questions