Hiddenguy
Hiddenguy

Reputation: 537

Removing variables from numpy array

I have a code which creates this plot, but I do not know how to delete first data from "canal 1" - red line, and the last data from "canal 3" - blue line, those vertical lines. There are 266336 records in both canals, can you help? The red vertical line is a first record, and the blue is the last one.

import iodata as io
import matplotlib.pyplot as plt
import numpy as np
import time

testInstance = io.InputConverter()
start = time.time()
conversionError = io.ConversionError()
#f = testInstance.convert(r"S:\Python\", 1", conversionError)
f = testInstance.convert(r"/Users/Hugo/20160401", "201604010000", 
conversionError)
end = time.time()
print("time elapsed " + str(end - start))

if(conversionError.conversionSucces):
    print("Conversion succesful")
if(conversionError.conversionSucces == False):
    print("Conversion failed: " + conversionError.conversionErrorLog)
print "Done!"

# Create a new subplot for two canals 1 & 3
a = np.amin(f.f)
filename = 'C:/Users/Hugo/20160401/201604010000.dat'
d = open(filename,'rb')
t = u"\u00b0"
headersize = 64
header = d.read(headersize)
ax1 = plt.subplot(211)
ax1.set_title(header[:16] + ', ' +                          # station name
     'Canals: '+header[32:33]+' and '+header[34:35]+ ', '   # canals
     +'Temp'+header[38:43]+t+'C'                            # temperature
    +', '+'Time:'+header[26:32]+', '+'Date'+' '+header[16:26])      # date

plt.ylabel('Pico Tesle [pT]')
plt.xlabel('Time [ms]')
plt.plot(f.f[0,], label='Canal 1', color='r', linewidth=0.75, linestyle="-")
plt.plot(f.f[1,], label='Canal 3', color='b', linewidth=0.75, linestyle="-")
plt.legend(loc='upper right', frameon=False)
plt.grid()
# Create a new subplot for FFT
plt.subplot(212)
plt.title('Fast Fourier Transform')
plt.ylabel('Power [a.u.]')
plt.xlabel('Frequency Hz')
FFTdata = np.sqrt(f.f[0,]*f.f[0,]+f.f[1,]*f.f[1,])**1
samples = FFTdata.size
duration = 300 # in seconds
Fs = float(samples)/duration # sampling frequency (sample/sec)
delta_t = 1.0/Fs
t = np.arange(0, samples, 1)*delta_t
FFTdata_freq = np.abs(np.fft.rfft(FFTdata))**2
freq = np.fft.rfftfreq(samples, d=delta_t)

# Printing data
plt.semilogy(freq, FFTdata_freq)
plt.grid()
#plt.savefig('S:/Hugo/'+"201604010000"+'.png', bbox_inches = 
'tight')
plt.show()

Contents of f.f:

>>> print f.f[0,]
[ -59.57011259 -74.20675537 -90.53224156 ..., -1676.9703173 -1676.9703173 -1676.9703173 ]

>>> print f.f[1,] 
[ 1.48413511e+00 4.96417605e+00 8.39303992e+00 ..., -1.67697032e+03 -1.67697032e+03 -1.67697032e+03] 

iodata code:

import struct
import numpy as np

class ConversionError:
    def __init__(self):
        self.conversionSucces = True
        self.conversionErrorLog = "Correct"

    def reportFailure(self, errorlog):
        self.conversionSucces = False
        self.conversionErrorLog = errorlog

class DataStruct:
    def __init__(self,f,initTime,headerString):
        self.f = f
        self.initTime = initTime
        self.headerString = headerString

class InputConverter:
    def __init__(self):
        self.midAdc = 65536/2
        self.convFactor = 19.54

    def convert(self,filePath,fileName,conversionLog):
        try:
            d_file = open(filePath + "/" + fileName + ".dat", mode='rb')
         except IOError as e:
            conversionLog.reportFailure(e.strerror)

        file = d_file.read()
        datalen = len(file)
        headerString = file[:43]
        initime, = struct.unpack('>H', file[48:50])
        expectedMatrixWidth = (datalen - 72)/4
        outputMatrix = np.zeros((2, expectedMatrixWidth))
        index = 0;
        print "Processing..."
        for i in range(64, datalen-8, 4):
            e1, e2 = struct.unpack('>HH',file[i:i+4])
            outputMatrix[0, index] = (e1 - self.midAdc)/self.convFactor
            outputMatrix[1, index] = (e2 - self.midAdc)/self.convFactor
            index += 1


        return DataStruct(outputMatrix,initime,headerString)

Upvotes: 3

Views: 224

Answers (2)

scrpy
scrpy

Reputation: 1021

You could try using array slicing:

plt.plot(f.f[0,][1:], label='Canal 1', color='r', linewidth=0.75, linestyle="-")
plt.plot(f.f[1,][:-1], label='Canal 3', color='b', linewidth=0.75, linestyle="-")

Edit:

Because of the nature of the data, slicing off more than just the first/last data points is appropriate, as @Dascienz suggests in the comments. Something like this, where the first and last 50 data points are sliced off from both series:

plt.plot(f.f[0,][50:-50], label='Canal 1', color='r', linewidth=0.75, linestyle="-")
plt.plot(f.f[1,][50:-50], label='Canal 3', color='b', linewidth=0.75, linestyle="-")

Upvotes: 1

scrpy
scrpy

Reputation: 1021

Long-winded explanation of why my first answer didn't seem to have an effect...


Removing the first data point from canal 1 and the last data point from canal 3 will not get rid of the anomalies. Many of the data points contribute to them.

Look at the last three data points of f.f[0,] (canal 1, in red) and f.f[1,] (canal 3, in blue): they are all the same value: -1676.97003.... This explains the purple (i.e. both red and blue) spike on the right of the graph.

Also, look at the first three values of f.f[0,] (canal 1, in red): they are roughly -60, -75 and -90. Clearly getting rid of the very first one won't remove the anomalies on the left of the graph where the value goes all the way up to over 500, and down to less than -500. These values must occur at indices above 2, but still a lot less than 50000, which is why it looks like they occur at 0.

In short, to remove the anomalies you need clean up the data more carefully before plotting it, rather than just slicing off the first and/or last values (which is what my original answer did, and did correctly I believe).

Upvotes: 0

Related Questions