Starter2
Starter2

Reputation: 57

remove string from text file keep float

I'm looking to remove lines with strings or empty lines in a text file. It looks like this. As you can see the header repeat it self throught the file. The numbers of lines with data vary from each block. I need it to import as an array in numpy. At first I had comma for decimal point at least I was able to change that.

I tried this but it doesn't work at all:

from types import StringType

z = open('D:\Desktop\cycle 1-20 20-50 kPa (dot).dat', 'r')
for line in z.readlines():
    for x in z:
        if type(z.readline(x)) is StringType:
            print line


z.close()

Example of data:

bla bla

cyclical stuff                      Time:   81.095947   Sec 2012-08-02 17:05:42
stored :    1   cycle           stores for :    62  seg-cycle
Points :    4223
Servo_Hyd count Temps   Servo_Air pressure  Servo_Hyd load Servo_Hyd LVDT1  Servo_Hyd LVDT2 Servo_Hyd LVDT3
name1    name1    name1 name1   name1   name1   name1
1   60.102783   0.020013755 89.109558   0.3552089   0.4015148   -0.33822596
1   60.107666   0.020006953 89.025749   0.35519764  0.4015218   -0.33821729
1   60.112549   0.02000189  88.886292   0.3551946   0.4015184   -0.33822691
1   60.117432   0.020007374 89.559196   0.35519707  0.40151948  -0.33823174
1   60.122314   0.019991774 89.741402   0.35519552  0.40151322  -0.33822927
1   60.127197   0.020003742 89.748924   0.35520011  0.40150556  -0.33822462

bla bla

cyclical stuff                      Time:   81.095947   Sec 2012-08-02 17:05:42
stored :    1   cycle           stores for :    62  seg-cycle
Points :    4223
Servo_Hyd count Temps   Servo_Air pressure  Servo_Hyd load Servo_Hyd LVDT1  Servo_Hyd LVDT2 Servo_Hyd LVDT3
name1    name1    name1 name1   name1   name1   name1
1   60.102783   0.020013755 89.109558   0.3552089   0.4015148   -0.33822596
1   60.107666   0.020006953 89.025749   0.35519764  0.4015218   -0.33821729
1   60.112549   0.02000189  88.886292   0.3551946   0.4015184   -0.33822691
1   60.117432   0.020007374 89.559196   0.35519707  0.40151948  -0.33823174
1   60.122314   0.019991774 89.741402   0.35519552  0.40151322  -0.33822927
1   60.127197   0.020003742 89.748924   0.35520011  0.40150556  -0.33822462

Upvotes: 3

Views: 403

Answers (2)

Chris
Chris

Reputation: 18022

Python will read all file elements as strings initially unless you cast them, so your method won't work.

Your best bet is probably to use a regular expression to filter out lines with non-data characters in them.

f = open("datafile")
for line in f:
  #Catch everything that has a non-number/space in it
  if re.search("[^-0-9.\s]",line): 
     continue
  # Catch empty lines
  if len(line.strip()) == 0:
     continue
  # Keep the rest
  print(line)

f.close()

Upvotes: 4

oz123
oz123

Reputation: 28868

Why are you not using numpy.loadtxt ? it has a very nice interface exactly for these cases.
See the documentation here

yourArry = np.loadtxt(open('yourfilename.txt', skiprows=7)

Also, since you have the heder (which should be header as an something which can be found in the top of a file) you could split you file into multiple files. You could do it with Python, or you could use the UNIX command csplit. How to do it, and what you will get:

oz123@:~/tmp> csplit -k data.txt   '/^bla/' '{*}'
0
787
786
oz123@:~/tmp> ls xx
xx00  xx01  xx02
oz123@:~/tmp> ls xx00
xx00
oz123@:~/tmp> cat xx00
oz123@:~/tmp> cat xx01
bla bla

cyclical stuff                      Time:   81.095947   Sec 2012-08-02 17:05:42
stored :    1   cycle           stores for :    62  seg-cycle
Points :    4223
Servo_Hyd count Temps   Servo_Air pressure  Servo_Hyd load Servo_Hyd LVDT1  Servo_Hyd LVDT2 Servo_Hyd LVDT3
name1    name1    name1 name1   name1   name1   name1
1   60.102783   0.020013755 89.109558   0.3552089   0.4015148   -0.33822596
1   60.107666   0.020006953 89.025749   0.35519764  0.4015218   -0.33821729
1   60.112549   0.02000189  88.886292   0.3551946   0.4015184   -0.33822691
1   60.117432   0.020007374 89.559196   0.35519707  0.40151948  -0.33823174
1   60.122314   0.019991774 89.741402   0.35519552  0.40151322  -0.33822927
1   60.127197   0.020003742 89.748924   0.35520011  0.40150556  -0.33822462

oz123@:~/tmp> cat xx02
bla bla

cyclical stuff                      Time:   81.095947   Sec 2012-08-02 17:05:42
stored :    1   cycle           stores for :    62  seg-cycle
Points :    4223
Servo_Hyd count Temps   Servo_Air pressure  Servo_Hyd load Servo_Hyd LVDT1  Servo_Hyd LVDT2 Servo_Hyd LVDT3
name1    name1    name1 name1   name1   name1   name1
1   60.102783   0.020013755 89.109558   0.3552089   0.4015148   -0.33822596
1   60.107666   0.020006953 89.025749   0.35519764  0.4015218   -0.33821729
1   60.112549   0.02000189  88.886292   0.3551946   0.4015184   -0.33822691
1   60.117432   0.020007374 89.559196   0.35519707  0.40151948  -0.33823174
1   60.122314   0.019991774 89.741402   0.35519552  0.40151322  -0.33822927
1   60.127197   0.020003742 89.748924   0.35520011  0.40150556  -0.33822462

Upvotes: 0

Related Questions