Reputation: 57
I'm looking to remove lines with strings or empty lines in a text file. It looks like this. As you can see the header repeat it self throught the file. The numbers of lines with data vary from each block. I need it to import as an array in numpy. At first I had comma for decimal point at least I was able to change that.
I tried this but it doesn't work at all:
from types import StringType
z = open('D:\Desktop\cycle 1-20 20-50 kPa (dot).dat', 'r')
for line in z.readlines():
for x in z:
if type(z.readline(x)) is StringType:
print line
z.close()
Example of data:
bla bla
cyclical stuff Time: 81.095947 Sec 2012-08-02 17:05:42
stored : 1 cycle stores for : 62 seg-cycle
Points : 4223
Servo_Hyd count Temps Servo_Air pressure Servo_Hyd load Servo_Hyd LVDT1 Servo_Hyd LVDT2 Servo_Hyd LVDT3
name1 name1 name1 name1 name1 name1 name1
1 60.102783 0.020013755 89.109558 0.3552089 0.4015148 -0.33822596
1 60.107666 0.020006953 89.025749 0.35519764 0.4015218 -0.33821729
1 60.112549 0.02000189 88.886292 0.3551946 0.4015184 -0.33822691
1 60.117432 0.020007374 89.559196 0.35519707 0.40151948 -0.33823174
1 60.122314 0.019991774 89.741402 0.35519552 0.40151322 -0.33822927
1 60.127197 0.020003742 89.748924 0.35520011 0.40150556 -0.33822462
bla bla
cyclical stuff Time: 81.095947 Sec 2012-08-02 17:05:42
stored : 1 cycle stores for : 62 seg-cycle
Points : 4223
Servo_Hyd count Temps Servo_Air pressure Servo_Hyd load Servo_Hyd LVDT1 Servo_Hyd LVDT2 Servo_Hyd LVDT3
name1 name1 name1 name1 name1 name1 name1
1 60.102783 0.020013755 89.109558 0.3552089 0.4015148 -0.33822596
1 60.107666 0.020006953 89.025749 0.35519764 0.4015218 -0.33821729
1 60.112549 0.02000189 88.886292 0.3551946 0.4015184 -0.33822691
1 60.117432 0.020007374 89.559196 0.35519707 0.40151948 -0.33823174
1 60.122314 0.019991774 89.741402 0.35519552 0.40151322 -0.33822927
1 60.127197 0.020003742 89.748924 0.35520011 0.40150556 -0.33822462
Upvotes: 3
Views: 403
Reputation: 18022
Python will read all file elements as strings initially unless you cast them, so your method won't work.
Your best bet is probably to use a regular expression to filter out lines with non-data characters in them.
f = open("datafile")
for line in f:
#Catch everything that has a non-number/space in it
if re.search("[^-0-9.\s]",line):
continue
# Catch empty lines
if len(line.strip()) == 0:
continue
# Keep the rest
print(line)
f.close()
Upvotes: 4
Reputation: 28868
Why are you not using numpy.loadtxt
? it has a very nice interface exactly for these cases.
See the documentation here
yourArry = np.loadtxt(open('yourfilename.txt', skiprows=7)
Also, since you have the heder (which should be header as an something which can be found in the top of a file) you could split you file into multiple files. You could do it with Python, or you could use the UNIX command csplit
. How to do it, and what you will get:
oz123@:~/tmp> csplit -k data.txt '/^bla/' '{*}'
0
787
786
oz123@:~/tmp> ls xx
xx00 xx01 xx02
oz123@:~/tmp> ls xx00
xx00
oz123@:~/tmp> cat xx00
oz123@:~/tmp> cat xx01
bla bla
cyclical stuff Time: 81.095947 Sec 2012-08-02 17:05:42
stored : 1 cycle stores for : 62 seg-cycle
Points : 4223
Servo_Hyd count Temps Servo_Air pressure Servo_Hyd load Servo_Hyd LVDT1 Servo_Hyd LVDT2 Servo_Hyd LVDT3
name1 name1 name1 name1 name1 name1 name1
1 60.102783 0.020013755 89.109558 0.3552089 0.4015148 -0.33822596
1 60.107666 0.020006953 89.025749 0.35519764 0.4015218 -0.33821729
1 60.112549 0.02000189 88.886292 0.3551946 0.4015184 -0.33822691
1 60.117432 0.020007374 89.559196 0.35519707 0.40151948 -0.33823174
1 60.122314 0.019991774 89.741402 0.35519552 0.40151322 -0.33822927
1 60.127197 0.020003742 89.748924 0.35520011 0.40150556 -0.33822462
oz123@:~/tmp> cat xx02
bla bla
cyclical stuff Time: 81.095947 Sec 2012-08-02 17:05:42
stored : 1 cycle stores for : 62 seg-cycle
Points : 4223
Servo_Hyd count Temps Servo_Air pressure Servo_Hyd load Servo_Hyd LVDT1 Servo_Hyd LVDT2 Servo_Hyd LVDT3
name1 name1 name1 name1 name1 name1 name1
1 60.102783 0.020013755 89.109558 0.3552089 0.4015148 -0.33822596
1 60.107666 0.020006953 89.025749 0.35519764 0.4015218 -0.33821729
1 60.112549 0.02000189 88.886292 0.3551946 0.4015184 -0.33822691
1 60.117432 0.020007374 89.559196 0.35519707 0.40151948 -0.33823174
1 60.122314 0.019991774 89.741402 0.35519552 0.40151322 -0.33822927
1 60.127197 0.020003742 89.748924 0.35520011 0.40150556 -0.33822462
Upvotes: 0